10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence presented is convincing overall, though the theoretical grounding could benefit from further development.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. This is an interesting study that supports, complements and extends previous related analyses on the conservation and mutational tolerance of disordered regions, with a particular focus on disordered regions in proteins that are found in condensates.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Comments on revisions:

      Unfortunately my concerns about lack of theoretical grounding and validation (especially critical in lack of theoretical grounding) persist. The argument about correlation between ESM2 scores and MSA conservation is circular. Protein language models already encode residue‑level conservation, so agreement with conservation does not establish new predictive power. For IDRs, conservation is a poor surrogate for function because many functions are mediated by short, degenerate SLiMs that are frequently gained and lost. Sequence‑only predictions therefore need orthogonal (preferably experimental or at the least in silico) tests. Finally, without a family‑level holdout (e.g., cluster de‑duplication at low identity) and prospective tests, overlap with known motifs cannot rule out training‑data memorization/near‑duplicates.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      We thank the reviewer for their thoughtful evaluation of our manuscript and for the supportive comments. As outlined in the responses below, we have made substantial revisions to clarify the novel observations presented in our study and to strengthen the connection between sequence conservation and phase separation.

      Comment 1: With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly.

      We have addressed the specific comments as outlined below.

      Comment 1a: Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that (Alderson et al) paper .

      We thank the reviewer for the comment. However, we would like to clarify that our findings show subtle but important differences from those reported by Alderson et al. Specifically, Alderson et al. used AlphaFold2 predictions to identify IDRs that undergo disorder-to-order transitions, which the authors termed as conditionally folded IDRs. These regions could potentially be functionally important, assuming that function of IDRs necessitate folding.

      We argue, however, that, the validity of this structure-function relationship for IDRs remains to be tested. In our opinion, The most direct way to evaluate the functional significance is via evaluating the evolutionary conservation.

      As shown in Author response image 1, the correlation between pLDDT scores and the conservation score, while noticable, is significantly weaker than that between the ESM2 score and the conservation score.

      Author response image 1.

      Comparison of the correlation between AlphaFold2 pLDDT scores and conservation scores with the correlation between ESM2 scores and conservation scores. Calculations were performed using proteins in the MLO-hProt dataset. (A) Correlation between the mean AlphaFold2 pLDDT scores and conservation scores for various amino acids. Pearson correlation coefficients (r) are indicated in the figure legends. The four panels on the right present analogous correlation plots for amino acids grouped by structural order, as defined by their pLDDT scores. (B) Similar as in part A but for ESM2 scores.

      Therefore, we believe that ESM2 score is a better indicator than AlphaFold2 pLDDT score for functional relevance.

      Furthermore, for the human IDRs, we explicitly selected amino acids with pLDDT scores ≤ 70.

      These would be classified as structureless, disordered amino acids, according to the study by Alderson et al. Nevertheless, as shown in Figures 2 and 3 of the main text, our analyses still identifies conserved regions. Therefore, these regions may function via distinct mechanisms than the disorder to order transition.

      We now discuss the novelty of our work in the context of existing studies in the newly added Conclusions and Discussion: Related Work, as quoted below.

      “Numerous studies have sought to identify functionally relevant amino acid groups within IDRs [cite]. For instance, using multiple sequence alignment, several groups have identified evolutionarily conserved residues that contribute to phase separation [cite]. Alderson et al. employed AlphaFold2 to detect disordered regions with a propensity to adopt structured conformations, suggesting potential functional relevance [alderson et al].

      In contrast, our approach based on ESM2 is more direct: it identifies conserved residues without relying on alignment or presupposing that functional significance requires folding into stable 3D structures. Notably, many of the conserved residues identified in our analysis exhibit low pLDDT scores (Figure 2), implying potential functional roles independent of stable conformations.”

      Comment 1b: Dasmeh et al, Lu et al and Ho & Huang analysed conservation in IDRs, including aromatic residues and their role in phase separation.

      We thank the reviewer for bringing these works to our attention! We now explicitly discuss these studies in both the Discussion section as mentioned above and in the Introduction as quoted below.

      “Evolutionary analysis of IDRs is challenging due to difficulties in sequence alignment [cite], though several studies have attempted alignment of disordered proteins with promising results [Dasmeh et al, Lu et al and Ho & Huang].”

      Comment 1c: A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      We added a new paragraph in the Introduction to discuss the application of protein language models to IDRs and cited the suggested references.

      “While protein language models have been widely applied to structured proteins [cite], it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. Its unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling the mutational landscapes of folded proteins [cite] reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      Comment 2: On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      We thank the reviewer this insightful comment. We realized that our wording is not as precise as we should have been. We meant to state that the regions associated with phase separation are significantly enriched in these conserved residues. This is a significant finding and indicates that phase separation could be a source of evolutionary pressure in dictating IDP sequence conservation. However, we do not intend to suggest that phase separation is the only evolutionary pressure.

      The sentence has been revised to

      “Notably, regions associated with phase separation are significantly enriched in these conserved residues.”

      We further replaced the section title "Conserved, Disordered Residues Localize in Regions Driving Phase Separation" with "Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues" to further clarify our findings and avoid overinterpretation.

      Finally, we revised the following sentence in the discussion

      “Notably, these conserved, disordered residues are predominantly located in regions actively involved in phase separation, contributing to the formation of membraneless organelles.”

      to

      “Notably, regions actively involved in phase separation are enriched with these conserved, disordered residues, supporting their potential role in the formation of membraneless organelles.”

      The submitted manuscript provides clear evidence supporting the enrichment of conserved residues in MLO-driving IDRs. Specifically, Figures 4A and 4C demonstrate that these IDRs exhibit a substantially higher fraction of conserved residues compared to other IDRs involved in phase separation.

      In this analysis, the nMLO-hIDR group serves as a baseline, representing the distribution of conservation in disordered regions lacking MLO-related functions. In contrast, IDRs from MLOassociated groups show a pronounced lower shift in their median and interquartile ranges, indicating stronger evolutionary constraints. Within the dMLO cohort, the degree of conservation follows a distinct gradient: driving residues exhibit the highest levels of conservation, followed by participant residues, with non-participant residues showing values closer to the nMLO baseline. This pattern reflects the relative functional importance of each group in phase separation, with conservation levels corresponding to their roles in MLO scaffolding.

      To further support this, we computed, for each IDR, the fraction of conserved amino acids. As shown in Figure S11B, for IDRs that actively contribute to phase separation, the fraction is indeed higher than those not involved in phase separation. This analysis is now included in SI.

      During the revision, we explicitly evaluated whether conserved residues are preferentially located in regions associated with phase separation. To this end, for each protein in the MLO-hProt dataset, we calculated the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments as defined in Figure 4 of the main text.

      Figure S11A presents the distribution of p across all proteins. For comparison, we also include the distribution of 1− p, representing the probability of finding conserved residues in regions not associated with phase separation. On average, p exceeds 0.5, suggesting a tendency for conserved residues to be more frequently located in phase-separating regions. However, the difference between the two distributions is not statistically significant. This result may be due to the generally low density of conserved residues in IDRs, which makes the estimation of p challenging for individual proteins. Additionally, some conserved sites may be involved in functions unrelated to phase separation.

      We added the following text to the Discussion section of the main text.

      “We emphasize that the results presented in Figure 4 do not directly demonstrate that conserved residues are preferentially located in regions associated with phase separation. Although these regions are more enriched in conserved amino acids, their total sequence length can be smaller than that of non-phase-separating regions. As a result, the absolute number of conserved residues may still be higher outside phase-separating regions. To quantitatively assess this, we calculated, for each protein in the MLO-hProt dataset, the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments, as defined in Figure 4 of the main text. Figure S11 shows the distribution of p across all proteins. For comparison, we also present the distribution of 1− p, which reflects the probability of finding conserved residues in non-phase-separating regions. While the average value of p exceeds 0.5, indicating a trend toward conserved residues being more frequently located in phase-separating regions, the difference between the two distributions is not statistically significant. Future studies with expanded datasets may be necessary to clarify this trend.”

      Comment 3: It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

      We acknowledge that our previous labeling may have caused some confusion. Protein sequences used in Figures 2 and 3 include both folded and disordered domains. Results presented in these figures were constructed using full-length protein sequences to highlight the similarities and differences in ESM2 scores between folded and disordered domains.

      In contrast, the analyses presented in Figures 4 and 5 focus exclusively on IDRs to examine their role in phase separation.

      To prevent further confusion, we have renamed the dataset used in Figures 2 and 3 as MLO-hProt, emphasizing that the analysis pertains to entire protein sequences. The term MLO-hIDR is now reserved for a new dataset that includes only disordered residues, as used in Figures 4 and 5, and corresponding SI Figures.

      For the dMLO-IDR dataset, all except one amino acid (P40967, residue G592) are annotated as disordered in the MobiDB database (https://mobidb.org/). This database characterizes disordered regions based on a combination of predictive algorithms and experimental data. As illustrated in Figure S5A, 25.5% of the proteins in the dataset have direct experimental evidence supporting their disorderedness. These experimental annotations are derived from a diverse range of techniques (Figure S5B). For the remaining proteins, disorder was predicted by one or more computational tools. Although not all tools were applied to every protein, each protein in the dataset was identified as disordered by at least one method.

      For human proteins, IDRs were identified based on AlphaFold2 pLDDT scores, using a threshold of 70. As established in prior studies [1, 2], the pLDDT score provides a quantitative measure of local structural confidence, with lower values indicating greater structural disorder. IDRs associated with conditional folding or disorder-to-order transitions generally exhibit high pLDDT values (e.g., >70).

      Author response image 2 shows a violin plot of AlphaFold2 pLDDT scores for the various MLO-hIDR groups. The consistently low scores support the conclusion that these regions are structurally disordered.

      We also cross-checked the MLO-hIDR regions against the MobiDB database. As shown in Figure S6, approximately 76% of the proteins in the dataset are predicted to contain disordered regions. Among the non-labeled segments with pLDDT scores ≤ 70, the majority are relatively short, with segments of 1–5 residues accounting for approximately 80%.

      Author response image 2.

      AlphaFold pLDDT scores of hIDRs in different MLO-related groups.

      In addition to renaming the dataset, we also revised the manuscript to highlight the validation of disorderedness in section of Results: Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues.

      “The presence of evolutionarily conserved disordered residues raises the question of their functional significance. To explore this, we identified disordered regions of MLO-hProt using a pLDDT score less than 70 and partitioned these regions into two categories: drivers (dMLO-hIDR), which actively drive phase separation, and clients (cMLO-hIDR), which are present in MLOs under certain conditions but do not promote phase separation themselves [cite]. Additionally, IDRs from human proteins not associated with MLOs, termed nMLO-hIDR, were included as a control. To enhance statistical robustness, we extended our dataset by incorporating driver proteins from additional species [cite], resulting in the expanded dMLO-IDR dataset. Beyond the pLDDT-based classification, the majority of residues in these datasets are also predicted to be disordered by various computational tools and supported by experimental evidence (Figures S5 and S6).”

      Recommendation 1: The authors use the terms "evolutionary fitness of IDRs" (abstract and p. 5, for example), "fitness of amino acids" (p. 4), and "quantify the fitness of particular residues at specific sites" (p. 5). It is not clear what is meant by fitness in this context.

      We thank the reviewer for pointing out the ambiguity in the term fitness. To enhance clarity, we have replaced “fitness" with “mutational tolerance" to more directly emphasize the evolutionary conservation of specific residues.

      Recommendation 2: The authors write (p. 6) "Previous studies have demonstrated a strong correlation between ESM2 scores and changes in free energy related to protein structure stability". While that may be true, it might be worth noting that ESM2 scores report on the effects of mutations and function more broadly than stability because these models have previously been shown to capture conservation effects beyond stability.

      We fully agree with the reviewer’s comment and have revised the main text accordingly. Specifically, the referenced sentence has been revised and relocated, as shown below.

      “Our analysis demonstrated that HP1_α_’s structured domains consistently yield low ESM2 scores, reflecting strong mutational constraints characteristic of folded regions. These constraints are further evident in the local LLR predictions, as shown in Figure 2B, where we illustrate the folded region G120-T130. Given the functional importance of preserving the 3D of structured domains, mutations with greater detrimental effects are likely to disrupt protein folding substantially. This interpretation is consistent with previous studies reporting a significant correlation between ESM2 LLRs and changes in free energy associated with protein structural stability [cite].”

      Recommendation 3: p. 10: The authors write "To exclude sequences that no longer qualify as homologs, we filtered for sequences with at least 20% identity to the reference". How did they decide on 20% and why? And over which residues are these 20% calculated.

      We apologize for the earlier lack of clarity. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.

      We updated the Methods section of the main text to clarify.

      “We performed multi-sequence alignment (MSA) analysis using HHblits from the HH-suite3 software suite [citations], a widely used open-source toolkit known for its sensitivity in detecting sequence similarities and identifying protein folds. HHblits builds MSAs through iterative database searches, sequentially incorporating matched sequences into the query MSA with each iteration. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions.

      ...

      To refine alignment quality by focusing on closely related homologs, we filtered out sequences with ≤ 20% identity to the query, excluding weakly related sequences where only short segments show similarity to the reference. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.”

      We selected a 20% sequence identity threshold to balance inclusion of true homologs with exclusion of distant matches that may not share functional relevance. To determine this cutoff, we compared identity thresholds of 0%, 10%, 20%, and 40% and examined the resulting distributions of conservation and ESM2 scores across aligned residues for MLO-hProt dataset (Author response image 3). Thresholds of 10%, 20%, and 40% produced qualitatively similar results, with a consistent correspondence between low ESM2 scores and high conservation. Lower thresholds introduced highly divergent sequences that added noise to the alignment, resulting in reduced overall conservation scores. In contrast, higher thresholds excluded homologs with potentially meaningful conservation, particularly in disordered regions where conservation scores tend to be relatively low.

      Author response image 3.

      Histograms of the ESM2 score and the conservation score, presented in a format consistent with Figure 3B of the main text. The conservation scores were computed using aligned sequences with identity thresholds of ≥0, ≥10%, ≥20%, and ≥40% (left to right). Contour lines represent different levels of −log_P_(CS,ESM2), where P is the joint probability density of conservation score (CS) and ESM2 score. Contours are spaced at 0.5-unit intervals, highlighting regions of distinct density.

      Recommendation 4: In their description of "motif" searching algorithm (p. 20) I think that the search algorithm would give a different result whether the search is performed N->C or C->N (because the first residue (i) needs to have a score <0.5 but the last (j) could have a score >0.5 as long as the average is below 0.5. Is that correct? And if so, why did they choose an asymmetric algorithm? .

      We thank the reviewer for highlighting the asymmetry in our motif-search algorithm.

      To investigate this issue, we repeated the algorithm starting from the C-terminus and compared the resulting motifs with those obtained from the N-terminal scan. We found that the two sets of motifs overlap entirely: each motif identified from the C-terminal direction has a corresponding counterpart from the N-terminal scan. However, the motifs are not identical. The directionality of the search introduces additional amino acids—referred to here as peripheral residues—at the motif boundaries, which differ between the two sets.

      As shown in Author response image 4, the number of peripheral residues is small relative to the total motif length.

      To eliminate asymmetry and ambiguity, we have revised our method to perform bidirectional scans—from both the N- and C-termini—and define each motif as the overlapping region identified by both directions. This approach emphasizes the conserved core and avoids the inclusion of spurious terminal residues. The updated procedure is described in Methods: Motif Identification.

      “To identify motifs within a given IDR, we implemented the following iterative procedure. Starting from either the N– or C–terminus of the sequence, we first locate the initial residue i whose ESM2 score falls within 0.5. From i, residues are sequentially appended…”

      Author response image 4.

      Number of peripheral residues and their relative length to the full-motif length identified from both sides. (A). The unique motifs identified from N-to-C terminal direction. (B) The unique motifs identified from C-to-N terminal direction.

      “…in the direction toward the opposite terminus until the segment’s average ESM2 score exceeds 0.5; the first residue to breach this threshold is denoted j. The segment (i,i+1,..., j−1) is then recorded as a candidate motif. This process repeats starting from j until the end of the IDR is reached.

      We perform this full procedure independently from both termini and designate the final motif as the intersection of the two candidate-motif sets. This bidirectional overlap strategy excludes terminal residues that might transiently satisfy the average-score criterion only due to adjacent low-scoring regions, thereby isolating the conserved core of each motif. All other residues—those not included in either directional pass—are classified as non-motif regions, minimizing peripheral artifacts.”

      Accordingly, we have updated the Supplementary material: ESM2_motif_with_exp_ref.csv for the new identified motifs commonly exited from both N-terminal and C-terminal searches. Minor changes were observed in the set of motifs as being discussed, but these do not affect the main conclusions. Figures 5C, 5D, and S6 have been revised accordingly.

      Reviewer #2:

      Summary:

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      We thank the reviewer for their detailed and thoughtful critique of our manuscript. We recognize the importance of careful model validation, especially in the context of IDRs, and appreciate the opportunity to clarify the scope and rationale of our study. Below, we respond point-by-point to the main concerns.

      (1) The use of ESM2 is not validated for IDRs, and the authors provide no rationale for its applicability in this context.

      We thank the reviewer for raising this important point.

      First, we emphasize that ESM2 is a probabilistic language model trained entirely on amino acid sequences, without any structural supervision. The model does not receive any input about protein structure — folded or disordered — during training. Instead, it learns to estimate the likelihood of each amino acid at a given position, conditioned on the surrounding sequence context. This makes ESM2 agnostic to whether a sequence is folded or disordered; the model’s capacity to identify patterns of residue usage arises solely from the statistics of natural sequences.

      As such, ESM2 is not inherently biased toward folded proteins, even though previous studies have demonstrated its usefulness in identifying conserved and functionally constrained residues in structured domains [3–9]. These findings support the broader utility of language models for uncovering evolutionary constraints — and by extension, suggest that similar signatures could exist in IDRs, particularly if they are under functional selection.

      Indeed, if certain residues or motifs in IDRs are conserved due to their importance in biological processes (e.g., phase separation), we would expect such selection to be reflected in sequence-based features, which ESM2 is designed to detect. The model’s applicability to IDRs, then, is a natural extension of its core probabilistic architecture.

      To further evaluate this, we carried out an independent in silico validation using multiple sequence alignments (MSAs). This analysis allowed us to compute the evolutionary conservation of individual amino acids without any reliance on ESM2. We then compared these conservation scores to ESM2 scores and found a strong correlation between the two. This provides direct, quantitative support for the idea that ESM2 is capturing biologically meaningful sequence constraints — even in disordered regions.

      While we agree that experimental testing would ultimately provide the most compelling validation, we believe that our MSA-based comparison constitutes a strong and arguably ideal computational validation of the model’s predictions. It offers an orthogonal measure of evolutionary pressure that confirms the biological plausibility of ESM2 scores.

      We added the following text in the introduction to highlight the applicability of ESM2 to IDRs.

      “While protein language models have been widely applied to structured proteins, it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. It operates by estimating the likelihood of observing a given amino acid at a particular position, conditioned on the entire surrounding sequence context. This unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling fitness landscapes of folded proteins reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      (2) There is no experimental validation of the ESM2-based predictions in this study.

      We agree that experimental validation would provide definitive support for the utility of ESM2 in IDRs, and we explicitly state this as a limitation in the revised manuscript as quoted below.

      “Limitations: Despite the promising findings, our study has several limitations. Most notably, our analysis is purely computational, relying on ESM2-derived predictions and sequence-based conservation without accompanying experimental validation. While the strong correlation between ESM2 scores and evolutionary conservation provides compelling evidence that the identified motifs are functionally constrained, the precise biological roles of these motifs remain uncharacterized. ESM2 is well-suited for highlighting regions under selective pressure, but it does not provide mechanistic insights into how conserved motifs contribute to specific molecular functions such as phase separation, molecular recognition, or dynamic regulation. Determining these roles will require targeted experimental investigations, including mutagenesis and biophysical characterization.”

      In addition, we revised the manuscript title from “Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation" to “Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation". This revision softens the original claim to better reflect the absence of direct experimental evidence for the motifs’ role in phase separation.

      However, we also emphasize that the goal of our study is not to claim definitive predictive power, but rather to explore whether ESM2-derived mutational profiles align with known biological features of IDRs — and in doing so, to generate new, testable hypotheses.

      In addition, while no in vivo experiments were performed, our study does include an in silico validation step, as detailed in the response to the previous comment. The strong correlation between ESM2 scores and conservation scores provides direct support for the utility of ESM2 in identifying residues under evolutionary constraint in disordered regions.

      (3) The overlap between predicted motifs and known ones may be due totraining data leakage.

      We respectfully clarify that training data leakage is not possible in this case, as ESM2 is trained using unsupervised learning on raw protein sequences alone. The model has no access to experimental annotations, functional labels, or knowledge of which motifs are involved in phase separation. It only models statistical sequence patterns derived from evolutionarily observed proteins.

      Therefore, any agreement between ESM2-derived predictions and previously validated motifs arises not from memorization of experimental data, but from the model’s ability to learn meaningful sequence constraints from the natural distribution of proteins.

      (4) The authors should revamp the study with a testable predictive framework.

      We respectfully suggest that a full revamp is not necessary or appropriate in this context.

      As outlined in our previous responses, we believe that certain misunderstandings about the nature and capabilities of ESM2 may have influenced the reviewer’s assessment.

      Importantly, both Reviewer 1 and Reviewer 3 express strong support for the significance and novelty of this work, and recommend publication following minor revisions.

      In this context, we believe the manuscript provides a useful contribution as a first step toward understanding disordered regions using language models, and that it has value even in the absence of direct experimental testing. We have now better positioned the manuscript in this light, clarified limitations, and suggested concrete next steps for follow-up research.

      We hope these clarifications and revisions address the reviewer’s concerns, and we thank them again for helping us strengthen the framing, rigor, and clarity of our study.

      Reviewer #3:

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      We appreciate the reviewer’s thoughtful words and support for our work.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions.

      As detailed in the Response to Recommendation 6, we conducted additional analyses to connect the identified conserved regions with their biological functions.

      (2) Very limited discussion of potential future work and of limitations.

      We have substantially revised the Conclusions and Discussion section to provide a detailed analysis of the study’s limitations and to propose several directions for future research, as elaborated in our Response to Recommendation 5 below.

      Recommendation 1: The authors describe the ESM2 score such that lower scores are associated with conserved residues, stating that "lower scores indicate higher mutational constraint and reduced flexibility, implying that these residues are more likely essential for protein function, as they exhibit fewer permissible mutational states." However, when examining intrinsically disordered regions (IDRs), which are known to drive phase separation, I observe that the ESM2 score is relatively high (Figure 3C, pLDDT < 50, and Supplementary Figure S2). Could the authors clarify how this relatively high score aligns with the conservation of motifs that drive phase separation?

      We thank the reviewer for this insightful comment. We would like to clarify that most amino acids in the IDRs are not conserved, even for IDRs that contribute to phase separation. Only a small set of amino acids in these IDRs, which we term as motifs, are evolutionarily conserved with low ESM2 scores. Therefore, the ESM2 scores exhibit bimodal distribution at high and low values, as shown in Figures 4A and 4C of the manuscript. When averaged over all the amino acids, the mean ESM2 scores, plotted in Figure 3C, are relatively high due to dominant population of non-conserved amino acids.

      Recommendation 2: The authors mention: "We first analyzed the relationship between ESM2 and pLDDT scores for human Heterochromatin Protein 1 (HP1, residues 1-191)". I appreciate this example as a demonstration of amino acid conservation in IDRs. However, it is questionable whether the authors could provide some more examples to support amino acid conservation particularly within the IDRs along with lower ESM2 score (e.g, Could the authors provide some additional examples of "conserved disordered" regions in various proteins which are associated with relatively low ESM2 score as appear in Figure 2A).

      We thank the reviewer for this valuable suggestion. We want to kindly noted that the conserved residues on IDRs are prevalent as indicated in Figures 2D and 3B. To further illustrate the prevalence of “conserved disordered” regions, we generated ESM2 versus pLDDT score plots for the full dMLO–hProt dataset (82 proteins) in Figure S2. In these plots, residues with pLDDT ≤ 70 are highlighted in blue to denote structural disorder (dMLO-hIDR), and these disordered residues with ESM2 score ≤ 1.5 are shown in purple to indicate conserved disordered segments.

      Recommendation 3: Could the authors plot a Violin conservation score plot for Figure 4A to emphasise the relationship between ESM2 scores and conservation scores of disordered residues?

      We thank the reviewer for this suggestion. We included a violin plot illustrating the distribution of conservation scores for disordered residues across all four IDR groups, shown in Author response image 5. Consistent with the findings in Figure 4A, the phase separation drivers (dMLO-hIDR and dMLOIDR) exhibit a higher proportion of conserved amino acids compared to the client group (cMLOhIDR).

      We also note that the nMLO-hIDR group may contain conserved residues due to functions unrelated to MLO formation, which could contribute to the higher observed levels of conservation in this group.

      Author response image 5.

      Violin plots illustrating the distribution of conservation scores for disordered residues across the nMLO–hIDR, cMLO–hIDR, dMLO–hIDR, and dMLO–IDR datasets. Pairwise statistical comparisons were conducted using two-sided Mann–Whitney U tests on the conservation score distributions (null hypothesis: the two groups have equal medians). P-values indicate the probability of observing the observed rank differences under the null hypothesis. Statistical significance is denoted as follows: ***: p < 0.001; **: p < 0.01; *:p < 0.05;

      Recommendation 4: It will be appreciated if the authors could add to Figure 4 Violin plots, a statistical comparison between the different groups.

      We thank the reviewer for this valuable suggestion. We included the p-values for Figures 4A and 4C to quantify the statistical significance of differences in the distributions.

      Most comparisons are highly significant (p < 0.001), while the largest p-value (p = 0.089) between the conservation score of driving and non-participating groups (Figure 4C) still suggests a marginally significant trend.

      Recommendation 5: Could the authors expand more on potential future research directions using ESM2, given its usefulness in identifying conserved motifs? Specifically, how do the authors envision conserved motifs will contribute to future discoveries/applications/models using ESM (e.g, discuss the importance of conserved motifs, especially in IDRs motifs, in protein phase transition prediction in relation to diseases).

      We thank the reviewer for this insightful comment. To further assess the functional relevance of the conserved motifs, we incorporated pathogenic variant data from ClinVar [10, 11] to evaluate mutational impacts. As shown in Figure S12A and B, a substantial number of pathogenic variants in MLO-hProt proteins are associated with low ESM2 LLR values. This pattern holds for both folded and disordered residues.

      Moreover, we observed that variants located within motifs are more frequently pathogenic compared to those outside motifs (Figure S12C). In the main text, motifs were defined only for driver proteins; however, the available variant data for this subset are limited (6 data points). To improve statistical power, we extended motif identification to include both client and driver human proteins, following the same methodology described in the main text. Consistent with previous findings, variants within motifs in this expanded set are also more likely to be pathogenic. These results further support the functional importance of both low ESM2-scoring residues and the conserved motifs in which they reside.

      The following text was added in the Discussion section of the manuscript to discuss these results and outline future research directions.

      “Several promising directions could extend this work, both to refine our mechanistic understanding and to explore clinical relevance. One avenue is testing the hypothesis that conserved motifs in scaffold proteins act as functional stickers, mediating strong intermolecular interactions. This could be evaluated computationally via free energy calculations or experimentally via interaction assays. Deletion of such motifs in client proteins may also reduce their partitioning into condensates, illuminating their roles in molecular recruitment.

      To explore potential clinical implications, we analyzed pathogenicity data from Clin-Var [10, 11]. As shown in Figure S12A, single-point mutations with low LLR values—indicative of constrained residues—are enriched among clinically reported pathogenic variants, while benign variants typically exhibit higher LLR values. Moreover, mutations within conserved motifs are significantly more likely to be pathogenic than those in non-motif regions (Figure S12B). These findings highlight the potential of ESM2 as a first-pass screening tool for identifying clinically relevant residues and suggest that the conserved motifs described here may serve as priorities for future studies, both mechanistic and therapeutic.”

      Moreover, the functional significance of conserved motifs, particularly their implications in disease and pathology, warrants further investigation. As an initial analysis, we incorporated ClinVar pathogenic variant data [citation] to assess mutational effects within our datasets. As illustrated in Figure R12A, single-point mutations with low LLR values are enriched among clinically reported pathogenic variants, whereas benign variants are more commonly associated with higher LLR values. Notably, mutations within conserved motifs are substantially more likely to be pathogenic compared to those in non-motif regions. These findings highlight the potential of ESM2 as a firstpass tool for identifying residues of clinical relevance. The conserved motifs identified here may be prioritized in future studies aimed at elucidating their biological roles and evaluating their viability as therapeutic targets.

      Recommendation 6: The authors mention: "Our findings provide strong evidence for evolutionary pressures acting on specific IDRs to preserve their roles in scaffolding phase separation mechanisms, emphasizing the functional importance of entire motifs rather than individual residues in MLO formation." They also present a word cloud of functional motifs in Figure 5D. Although it makes sense that evolutionarily conserved motifs, especially within the IDRs regions, act as functional units, I think there is no direct evidence for such functionality (e.g., examples of biological pathways associated with IDRs and phase separation). Hence, there is no justification to write in the figure caption: "ESM2 Identifies Functional Motifs in driving IDRs" unless the authors provide some examples of such functionality. This will even make the paper stronger by establishing a clear connection to biological pathways, and hence these motifs can serve as potential drug targets.

      We thank the reviewer for this insightful suggestion. We have replaced “functional motifs" with “conserved motifs" in the figure caption.

      Identifying the precise biological pathways associated with the conserved motifs is a complex task and a comprehensive investigation lies beyond the scope of this study. Nonetheless, as an initial effort, we explored the potential functions of these motifs using annotations available in DisProt (https://disprot.org/).

      DisProt is the leading manually curated database dedicated to IDPs, providing both structural and functional annotations. Expert curators compile experimentally validated data, including definitions of disordered regions, associated functional terms, and supporting literature references. Author response image 6 presents a representative DisProt entry for DNA topoisomerase 1 (UniProt ID: P11387), illustrating its structural and biological annotation.

      For each motif, we located the corresponding DisProt entry and assigned a functional annotation based on the annotated IDR from which the motif originates. We emphasize that this functional assignment should be regarded as an approximation. Because experimental annotations often pertain to the entire IDR, regions outside the motif may also contribute to the reported function.

      Nevertheless, the annotations provide valuable insights.

      Author response image 6.

      Screenshot of information provided by the DisProt database. Detailed annotations of biological functions and structural features, along with experimental references, are accessible via mouse click.

      Approximately 50% of ESM2-predicted IDR motifs lack functional annotations. Among those that are annotated, motifs from the dMLO-IDR dataset are predominantly associated with “molecular condensate scaffold activity,” followed by various biomolecular binding functions (Author response image 7A). These findings support the role of these motifs in MLO formation.

      For comparison, we applied the same identification procedure (described in Methods: Motif Identification) to motifs from the nMLO-hIDR dataset. In contrast to the dMLO-IDR motifs, these exhibit a broader range of annotated functions related to diverse cellular processes. Collectively, these results suggest that motifs identified by ESM2 are aligned with biologically relevant functions captured in current databases.

      Finally, as illustrated in Figure S12 and discussed in the Response to Recommendation 5, variants occurring within identified motifs are more likely to be pathogenic than those in non-motif regions, further underscoring their functional importance.

      Author response image 7.

      Biological functions of ESM2-predicted motifs. (A) Distribution of biological functions associated with all identified motifs from dMLO-IDR driving groups. (B) Distribution of biological functions associated with all identified motifs from nMLO-hIDR groups.

      Recommendation 7: In Figure 2C the authors present FE (I assume this is free energy), some discussion about the difference in the free energy referring to the "a" region is missing (i.e. both "Folded" and "Disordered" regions are associated with low ESM score but with low and high free energy (FE), respectively.

      We thank the reviewer for the comments. FE indeed abbreviates free energy. To improve clarify and avoid confusion, we have updated all figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability in the contour density plots.

      We used “a" in Figures 2C and 2D to refer to regions with low ESM2 scores, which appears a local minimum in both plots. Since most residues in folded regions are conserved, region a has lower free energy than region b in Figure 2C. On the other hand, as most residues in disordered regions are not conserved, as we elaborated in Response to Recommendation 1, region a has lower population and higher free energy than region b.

      To avoid confusion, we have replaced “a" and “b" in Figure 2D with “I" and “II".

      Recommendation 8: Figure S2: It would be useful to plot the same figure for structured and disordered regions as well.

      We are not certain we fully understood this comment, as we believe the requested analysis has already been addressed. In Figure S2, we used the AlphaFold2 pLDDT score to represent the structural continuum of different protein regions, where residues with pLDDT > 70 (red and lightred bars) are classified as structured, while those with pLDDT ≤ 70 (blue and light-blue bars) are classified as disordered.

      Minor suggestion 1: Could the authors clarify the meaning of the abbreviation "FE" in the colorbar of the contour line? I assume this is free energy.

      We have updated all contour density plot figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability.

      Minor suggestion 2: In Figure 2A - do the authors mean "Conserved folded" instead of just "Folded"? If so, could the authors indicate this?

      We thank the reviewer for this comment. The ESM2 scores indeed suggest that, within folded regions, there may be multiple distinct groups exhibiting varying degrees of evolutionary conservation. However, as our primary focus is on IDRs, we chose not to investigate these distinctions further.

      Figure 2A illustrates a randomly selected folded region based on AlphaFold2 pLDDT scores.

      References

      (1) Ruff, K. M.; Pappu, R. V. AlphaFold and Implications for Intrinsically Disordered Proteins. Journal of Molecular Biology 2021, 433, 167208.

      (2) Alderson, T. R.; Pritišanac, I.; Kolaric, Ð.; Moses, A. M.; Forman-Kay, J. D. Systematic´ Identification of Conditionally Folded Intrinsically Disordered Regions by AlphaFold2. Proceedings of the National Academy of Sciences of the United States of America, 120, e2304302120.

      (3) Brandes, N.; Goldman, G.; Wang, C. H.; Ye, C. J.; Ntranos, V. Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model. Nature Genetics 2023, 55, 1512–1522.

      (4) Lin, Z. et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. 2023.

      (5) Zeng, W.; Dou, Y.; Pan, L.; Xu, L.; Peng, S. Improving Prediction Performance of General Protein Language Model by Domain-Adaptive Pretraining on DNA-binding Protein. Nature Communications 2024, 15, 7838.

      (6) Gong, J. et al. THPLM: A Sequence-Based Deep Learning Framework for Protein Stability Changes Prediction upon Point Variations Using Pretrained Protein Language Model. Bioinformatics 2023, 39, btad646.

      (7) Lin, W.; Wells, J.; Wang, Z.; Orengo, C.; Martin, A. C. R. Enhancing Missense Variant Pathogenicity Prediction with Protein Language Models Using VariPred. Scientific Reports 2024, 14, 8136.

      (8) Saadat, A.; Fellay, J. Fine-Tuning the ESM2 Protein Language Model to Understand the Functional Impact of Missense Variants. Computational and Structural Biotechnology Journal 2025, 27, 2199–2207.

      (9) Chu, S. K. S.; Narang, K.; Siegel, J. B. Protein Stability Prediction by Fine-Tuning a Protein Language Model on a Mega-Scale Dataset. PLOS Computational Biology 2024, 20, e1012248.

      (10) Landrum, M. J.; Lee, J. M.; Riley, G. R.; Jang, W.; Rubinstein, W. S.; Church, D. M.; Maglott, D. R. ClinVar: Public Archive of Relationships among Sequence Variation and Human Phenotype. Nucleic Acids Research 2014, 42, D980–D985.

      (11) Landrum, M. J. et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Research 2018, 46, D1062–D1067.

    1. Author response:

      Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only). 

      The reviewer is correct that in this version of the tool, we focused on SNPs with effect on gene expression, as the majority of the SNPs identified by GWASs are non-coding SNPs. In the future improvement, we should also include coding SNPs that change the amino acid sequence of genes. We will discuss this point more in the revised manuscript.

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead. 

      This is a simple scheme to incorporate eQTL data from multiple tissues, assuming that the tissue that gives the strongest association is most relevant, or mainly mediates the effect from the SNP to the phenotype. This is a reasonable approach given that the tissues of origin for most of the phenotypes are unknown. In the future improvement, we should incorporate eQTL data from the appropriate tissue(s) if that is known.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants. 

      This is due to one of the limitations of the current algorithm: no direction of association is predicted explicitly. It could be that increasing the expression of a gene reduced the risk of one disease but increase the risk of another. Currently we have to analyze the details of the SNPs to infer direction once overlapping genes are found. This needs improvement in the future.  

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role. 

      We plan to incorporate reviewer's suggestion in the revised manuscript.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional? 

      We will consider a different presentation in the revised manuscript.

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals? 

      We will add these information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      Missing tissues or sparse eQTL certainly can produce false negatives as the signals linking the two phenotypes are simply not captured in the data. It is less likely to produce false positives as long as the statistical test is well controlled.   

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this? 

      When there are multiple SNPs in LD with multiple genes nearby, it is generally difficult to map the causal SNP and the causal gene it affected, and thus there will be spurious gene-trait links. When we calculate the global similarity based on the gene-trait association profiles,  we tried to control this by simulating with random GWASs that have the same power as the real GWAS and preserve the LD structure, as the spurious links will also be present in the simulated data (but may appear in different loci) that are used to calibrate the statistical significance. 

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these. 

      We assign SNPs to genes based on their strongest eQTL association from the available data. Improvement can be made if the relevant tissues for a trait are known (see response to Reviewer 1 above).

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative. 

      We agree with the reviewer that the reported trait links are speculative, and they should be treated as hypotheses generated from the computational analyses. To truly validate some of these proposed relationships, deeper functional analyses and experimental tests are needed.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness. 

      We agree more control on LD and power-matched simulations will be important for testing the robustness of the predictions.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

    3. Reviewer #1 (Public review):

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

    4. eLife Assessment

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

    1. eLife Assessment

      This investigation presents a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Employing transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial viability, offering a solid foundation for anti-mycobacterial drug discovery. However, there are minor but nonetheless significant concerns about data organization, which need to be addressed for greater scientific impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.

    3. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.<br /> (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.<br /> (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

    4. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tateishi et al. report a Tn-seq-based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for reviewing our manuscript and finding the significance of our data.

      Weaknesses:

      The paper lacks clarity in data presentation and organization. For example, some of the key data on cfu counts of clinical Mi strains in a mouse model can be presented along with the Tn-seq dataset in Figure 6, the visualization of which can be improved with volcano plots. etc. Improvement in data visualization is perhaps necessary throughout the paper.

      Thank you for the comment on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref #22) . Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication (new Ref #22) on the virulence of MAC-PD strains used in this study for mice in the revised manuscript (page12, line 212).

      The data of CFU counts were shown in new Supplementary Fig. 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively, except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (page 12, lines 212-216; new Supplementary Fig, 3b, new Supplementary Table 8)”.

      Regarding the suggestion to include volcano plots, we appreciate the proposal but chose not to adopt this format, as the main aim of this study was to identify genes commonly required for in vitro and in vivo fitness across multiple M. intracellulare strains, rather than to highlight differential genetic requirements within a single strain. Volcano plots are useful for visualizing differential values and significance for a single dataset but are less suited for cross-strain comparisons of shared gene sets. Our approach is aligned with the methodology used by Cary et al. (PLoS Pathog. 2018; new Ref#8), who similarly focused on identifying conserved genetic requirements across M. tuberculosis genotypes without employing volcano plots.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is not well-supported by the data presented in Figure 7.

      Thank you for the comments on the difference of adaptation for hypoxic growth between ATCC13950 and clinical MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of culture for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      The title of the paper is misleading as the study doesn't provide any mechanistic aspect of hypoxic adaptation in Mi.

      Thank you for the comment on the article title. We admit that this paper does not directly reveal the mechanism of hypoxic adaptation in M. intracellulare strains but provides the data on the different pattern of hypoxic adaptation between M. intracellulare strains in relation to the difference of genetic requirements. Therefore, we revised the title as ”Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare

      Reviewer #2 (Public Review):

      Summary:

      In the study titled "Functional genomics reveals the mechanism of hypoxic adaptation in nontuberculous mycobacteria" by Tateishi et al., the authors have used TnSeq to identify the common essential and growth-defect-associated genes that represent the genomic diversity of clinical M. intracellulare strains in comparison to the reference type strain. By estimating the frequency of Tn insertion, the authors speculate that genes involved in gluconeogenesis, the type VII secretion system, and cysteine desulfurase are relatively critical in the clinical MAC-PD strains than in the type strain, both for the extracellular survival and in a mouse lung infection model.

      Based on their analysis, the authors proposed to identify the mechanism of hypoxic adaptation in nontuberculous mycobacteria (NTM) which offer promising drug targets in the strains causing clinical Mycobacterium avium-intracellulare complex pulmonary disease (MAC-PD).

      Strengths:

      A major strength of the manuscript is the performance of the exhaustive set of TnSeq experiments with multiple strains of M. intracellulare during in vitro growth and animal infection.

      Thank you for reviewing our manuscript and acknowledging the performance of producing datasets in this study.

      Weaknesses:

      (1) The study suffers from the authors' preconceived bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950.

      Thank you for the comment regarding a potential bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways.

      In this study, we found that several of these pellicle-associated genes also showed increased genetic requirement in the clinical MAC-PD strains, suggesting a possible overlap in hypoxic adaptation mechanisms. We did not insist that clinical MAC-PD strains showed an increase of genetic requirements in all genes of hypoxic pellicle formation. Except for the gene sets involved in hypoxic pellicle formation in ATCC13950, almost no global information has been revealed on the pathogenesis of nontuberculous mycobacterial disease, which differs from the case of tuberculosis. Along with this finding, we investigated the effect of gene silencing on bacterial growth and preferential hypoxic adaptation observed by growth kinetics in clinical MAC-PD strains compared to ATCC13950. At first glance, to focus on the gene sets of hypoxic pellicle formation seems to be “biased”, but we proceeded this research step by step based on our achievements. We consider these data provide valuable information on the pathogenesis of MAC-PD by clinical MAC-PD strains.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).

      (2) An important set of data with the ATCC13950 reference strain is missing in the mouse infection study. In the absence of this, it is difficult to establish whether the identified genes are critical for infection/intracellular proliferation, specifically in the clinical isolates that are relatively more adapted for hypoxia.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we proved that ATCC13950 is eliminated within 4 weeks of infection (Tateishi Y. BMC Microbiol. 2023; new Ref#22). That means, it is impossible to perform in vivo TnSeq study due to the inability to harvest sufficient number of colonies.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) Statistical enrichment analysis of gene sets by GSEA wrongly involves genes required for hypoxic pellicle formation in ATCC13950 together with the gene sets found essential in the clinical MAC-PD strains, to claim that a significant % of genes belong to hypoxia-adaptation pathways. It could be factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains (and may not be related to hypoxia).

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing increased genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and Week4 to Week16 of infection) and 40% (70 and 79 genes out of 179 through Day 1 to Week 16 and Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and thorough Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      These data indicate that about 40-50% genes required for in vitro hypoxic pellicle formation are shared with the genes required for in vivo bacterial growth, and that about 40% strain-dependent/accessory essential genes are shared with the genes required for in vivo bacterial growth. Thus, the genes required for the growth of M.i.27 and M.i.198 in mouse lungs are enriched individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth. We have added the description of the reanalyzed data of GSEA in the manuscript (pages 16-17, lines 287-290). And the details of reanalyzed data of GSEA have been shown in Supplementary Fig. 5 and 6 as well as Supplementary Tables 15 and 16.

      (4) Validation of mouse infection experiments with individual mutants is missing.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL. Microbiol Immunol 68, 339-347 (2024).

      (5) Phenotypes with TnSeq and CRISPRi-based KD exhibit poor correlation with misleading justifications by the authors.

      Thank you for the comment on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect-associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results among universal essential genes such as glcB, inhA, gyrB and embB. Although the mechanism has not been fully proven yet, we consider that such inconsistent phenotypes with TnSeq and CRISPR- based knockdown may be related to the recently revealed bypass mechanism of gene essentiality which is characteristically observed in strain-specific/accessory essential genes (Rosconi F. Nat Micorbiol. 2022; new Ref#14). They suggested this bypass mechanism of gene essentiality in strain-dependent/accessory essential or growth-defect-associated genes from the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains. For example, knockout mutants are successfully recovered from transformation experiments targeting strain-specific/accessory essential genes in TnSeq such as cytidine monophosphate kinase cmk, formate tetrahydrofolate ligase fhs and farnesyl-diphosphate synthase fpp. The bypassing of gene essentiality can be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes fulfill the following three categories: i) high levels of conservation within and often across species, iii) limited genetic diversity, and iii) high and stable expression levels. Consequently, the universal essential genes are rigid, largely immutable key components to an organism’s survival. In the universal essential genes, the knockout recovery fails as shown by no colonies or only appearance of merodiploids. Taking into consideration such bypass mechanism of gene essentiality in strain-dependent/accessory essential genes, the lower effect of gene silencing of strain-dependent/accessory essential genes on bacterial growth may reflect pathway rewiring that helps the bacterial growth under suppression of the target gene expression.

      We have added the description of the possible reason for inconsistency between TnSeq and CRISPR-i results in the Result and Discussion in the revised manuscript (page 21, lines 367-376; pages 28-29, lines 489-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      In summary, this study is unable to provide mechanistic insights into why and how different MAC-PD mutant strains exhibit differential survival (in vitro and in animals) and adaptation to hypoxia. It remains to understand why the clinical strains show better adaptation to hypoxia and what is the impact of other stresses on their growth rates.

      Thank you for the comments on the issue of being unable to prove the mechanism of MAC-PD pathogenesis and adaptation to hypoxia. We admit that the original manuscript did not provide the apparent reason and mechanism of MAC-PD pathogenesis and adaptation to hypoxia. Following the comment, we have modified the manuscript tile as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

      However, we revealed the diversity of genetic requirements among the genus M. intracellulare including the type strain ATCC13950 and clinical MAC-PD strains. We revealed the characteristics of genetic requirements in clinical MAC-PD strains as increased genetic requirements of gluconeogenesis, type VII secretion system and cysteine desulfurase, the former two of which are also required in hypoxic pellicle formation in ATCC13950. Along with this, we demonstrated the difference of growth behavior under hypoxia between clinical MAC-PD strains and ATCC13950. Overall, we consider that we could provide the basic information suggesting the involvement of difference of genetic requirements among strains in the pathogenesis of MAC-PD.

      Reviewer #3 (Public Review):

      Summary:

      The study by Tateishi et al. utilized TnSeq in nine genetically diverse M. intracellulare strains, identifying 131 common essential and growth-defect-associated genes across those strains, which could serve as potential drug targets. The authors also provided an overview of the differences in gene essentiality required for hypoxic growth between the reference strain and the clinical strains. Furthermore, they validated the universal and accessory/strain-dependent essential genes by knocking down their expression using CRISPRi technique. Overall, this study offers a comprehensive assessment of gene requirements in different clinical strains of M. intracellular.

      Thank you for reviewing our manuscript and finding the significance of our data.

      (1) The rationale for using ATCC13950 versus clinical strains needs to be clarified. The reference strain ATCC13950 was obtained from the abdominal lymph node of a patient around 10 years ago and is therefore considered a clinical strain that has undergone passages in vitro. How many mutations have accumulated during these in vitro passages? Are these mutations significant enough to cause the behavior of ATCC13950 to differ from other recently sampled clinical strains? From the phylogenetic tree, ATCC13950 is located between M018 and M.i.27. Did the authors observe a similarity in gene essentiality between ATCC13950 and its neighbor strains? What is the key feature that separates ATCC13950 from these clinical strains? The authors should provide a strong rationale for how to interpret the results of this comparison in a clinical or biological context.

      Thank you for the comments on the rationale for using ATCC13950 versus clinical strains and the key feature that separates ATCC13950 from clinical MAC-PD strains.

      ATCC13950 was isolated in 1949, (not around 10 years ago) from 34-month-old female abdominal lymph node (Cuttino. Am J Pathol 1949; new Ref#11). Of note, the clinical background of the patient infected with ATCC13950 is quite different from the patients with MAC-pulmonary disease (MAC-PD), the incidence rate of which has been increasing worldwide without predisposing immunological disorders. ATCC13950 has been regarded as a type strain of genus M. intracellulare historically. And ATCC13950 is the first M. intracellulare strain to be sequenced in 2012 (Kim. J Bacteriol 2012; new Ref#59).

      The rationale for using ATCC13950 versus clinical MAC-PD strains is to find the answer to the question whether the essential genes and genetic requirements are similar or different between clinical MAC-PD strains and historical type strain ATCC13950. So far, there are two reports on TnSeq that compare genetic requirements between clinical mycobacterial strains and the type strains, one of which is M. tuberculosis (Carey AF. PLoS Pathogens. 2018; new Ref#8) and the other is M. abscessus (Akusobi C. mBio. 2025; new Ref#9, published after submission of our manuscript). They reported the difference and diversity of genetic requirements between clinical strain and type strains such as M. tuberculosis H37Rv and M. abscessus ATCC19977. We have added the mention of these previous reports to explain the rationale for setting the type strain ATCC13950 as a referential control strain. (page 5, lines 83-87)

      The genetic and functional analysis of clinical MAC-PD strains has not been conducted for a long time. In 2021, we have revealed the genomic diversity between clinical MAC-PD and ATCC13950 by comparative genomic analysis (Tateishi BMC Microbiol. 2021; new Ref#5). Except for our TnSeq study of ATCC13950 (Tateishi Sci Rep 2020; new Ref#10), no functional analysis has been conducted in clinical M. intracellulare strains. On our research stream of clinical MAC-PD strains, we expected that we could reveal the functional genomic characteristics of clinical MAC-PD strains by setting ATCC13950 as a referential control strain for analyzing TnSeq data.

      It seems an interesting viewpoint to consider the relationship between accumulation of mutations by in vitro passages during prolonged periods from first isolation in ATCC13950, and the difference of phenotypes between ATCC13950 and recently sampled clinical MAC-PD strains. However, there are no time-course samples of ATCC13950 isolates available. Therefore, we can neither investigate how many mutations have accumulated in a time-course manner, nor evaluate how much the accumulated mutations influence the phenotype in ATCC13950. It can be expected that the accumulation of in vitro mutations may cause the behavior of ATCC13950 different from clinical MAC-PD strains. However, it is to be elucidated yet which kinds of factors contribute to the characteristics of ATCC13950 that differ from clinical MAC-PD strains specifically.

      It seems an interesting viewpoint to investigate the similarity of gene essentiality between genetical neighbor strains. However, we focused on the overview of the profiles of gene essentiality in clinical MAC-PD strains compared to ATCC13950. Thus, it was out of scope to elucidate the details of gene essentiality in each genetic phylogeny that clinical MAC-PD strains belong. The overview of phylogenetic trees should be referred to our previous publication on the comparative genomic analysis of 55 strains (Tateishi Y. BMC Microbiol. 2021; new Ref#5, new Supplementary Fig. 1), and we have shown Fig. 1 as the extracted phylogenetic tree of subject strains. To elucidate the details of gene essentiality in each genetic clade, it would be necessary to include a considerable number of strains that we used for comparative genomic analysis in 2021 (Tateishi Y. BMC Microbiol. 2021; new Ref#5). Furthermore, it would be necessary to set a referential control strain other than ATCC13950 for comparing gene essentiality. So far, it is not the highest priority for us to elucidate the similarity of gene essentiality between phylogenetic clades in detail, and such investigation will be planned as a future study.

      The key features that separate ATCC13950 and clinical MAC-PD strains have not been proved yet, in contrast to the case of M. tuberculosis such as mutations in the gene of the response regulator PhoPR in the type strain H37Rv vs most clinical strains. However, the features that separate ATCC13950 and clinical MAC-PD strains may not be explained by a single genetic factor but may be explained by complicated factors such as epigenetic and/or regulatory factors. For example, the reason for the weakened virulence of H37Ra compared to H37Rv has not been able to be explained by simple genetic differences (Brosch R. Infect Immun. 1999).

      In summary, we set the historical type strain ATCC13950 which is derived from infant abdominal lymphadenitis as a referential control strain for TnSeq analysis, because we intended to reveal the characteristics of clinical MAC-PD strains in terms of the gene essentiality and genetic requirements by comparing the clinical MAC-PD strains with the non-MAC-PD reference strain. We consider that the profiles of gene essentiality and genetic requirements specific to clinical MAC-PD strains confer the pathogenesis in an increasing number of MAC-PD patients worldwide without predisposing immunological disorders.

      [References]

      Cuttino, J.T. & Mc, C.A. Pure granulomatous nocardiosis, a new fungus disease distinguished by intracellular parasitism; a description of a new disease in man due to a hitherto undescribed organism, Nocardia intracellularis, n. sp., including a study of the biologic and pathogenic properties of this species. Am J Pathol 25, 1-47 (1949).

      Kim, B.J. et al. Complete genome sequence of Mycobacterium intracellulare clinical strain MOTT-64, belonging to the INT1 genotype. J Bacteriol 194, 3268 (2012).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al.. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020)

      Brosch R. et al. Genomic analysis reveals variation between Mycobacterium tuberculosis H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect Immun. 67, 5768-74 (1999).

      (2) Regarding the 'nine representative strains of M. intracellulare with diverse genotypes in this study,' how were these nine strains selected? To what extent do they represent the genetic diversity of the M. intracellulare population? A phylogenetic tree illustrating the global genetic diversity of the M. intracellulare population, with these strains marked on it, would be important to demonstrate their genetic representativeness.

      Thank you for the comments on the selection of 9 subject strains. We selected the 9 strains based on the phylogenetic tree we published in 2021 (BMC Microbiol 2021; new Ref#5). We have shown the global phylogenetic tree of the M. intracellulare population in new supplementary Fig. 1. We have selected 4 or 5 strains from the major two groups (typical M. intracellulare group and M. paraintracellulare-M. indicus pranii group) for this TnSeq study, respectively.

      [Reference]

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      (3) The authors observed a considerable amount of differential gene requirements in clinical strains. However, the genetic underpinning underlying the differential requirement of genes in clinical strains was not investigated or discussed. Because M. intracellulare has a huge number of accessory genes, the authors should at least check whether the differential requirement could be explained by the existence of a second copy of functional analogous genes or duplications.

      Thank you for the comments on the effect of gene duplication on the change of genetic requirements between strains. Following the comments, we conducted blast search for the 162 genes showing increased Tn insertion reads in each subject strain. We found that M019 has duplicate genes of OCU_RS44705 coding adenosylhomocysteinase (LOCUS_42940: ahcY_1, LOCUS_21000: ahcY_2). However, there were no duplicate genes found in the remaining 161 genes showing increased Tn insertion reads.

      From these results, we consider that gene duplication has minor effects on the change of genetic requirements between strains. Rather, sequence differences and accessory genes may play a key role in determining the difference of genetic requirements.

      We have added a description of the above-mentioned result in the Result section (pages11-12, lines 191-199).

      (4) Growth in aerobic and hypoxic conditions: The authors concluded that clinical strains are better adapted to hypoxia, as reflected by their earlier entry into the log phase. They presented the 'Time at midpoint' and 'Growth rate at midpoint.' However, after reviewing the growth curves, I noticed that ATCC13950 had a longer lag phase compared to other strains under hypoxic conditions, and its phylogenetic neighbor M018 also had a longer lag phase. Hence, I do not believe a conclusion can be drawn that clinical strains are better adapted to hypoxia, as this behavior could be specific to a particular clade. It's also possible that the ATCC13950 strain has adapted to aerobic growth. I would suggest that the authors include growth curves in the main figures. The difference in 'Time at midpoint' could be attributed to several factors, and visualizing the growth curves would provide additional context and clarity.

      Thank you for the comments on the possibility of genotypes as a determinant of growth pattern in M. intracelulare. Following the comments, we performed aerobic and hypoxic growth assay in the two strains (M005 and M016) that neighbor ATCC13950.

      Author response image 1.

      The phylogenetic relationship between M005, M016 and ATCC13950. The former two strains are squared in blue.

      M005 reached midpoint later than ATCC13950 both in aerobic and hypoxic conditions. By contrast, M016 reached midpoint three quarters earlier than ATCC13950 under hypoxic conditions. The growth rate was not significantly different between M005, M016 or ATCC13950 under either aerobic or hypoxic conditions, although P-value of M005 vs ATCC13950 was 0.0512 under aerobic conditions on Steel’s multiple comparisons test.

      From the data of growth pattern in M005 and M016, we suggest that in addition to gene essentiality, genotypes may have some impact on the bacterial growth pattern under hypoxia; however, since there was a significant difference in the timing of hypoxic adaptation among ATCC13950 and its neighbor strains, bacterial growth pattern under hypoxia is considered to be determined by multiple factors such as genetic requirements and unproven regulatory systems. Taking into consideration that there are lots of genetically diverse strains other than ATCC13950 clade, many clinical MAC-PD strains are possibly better adapted to hypoxia.

      Responding to the reviewer’s recommendation, we have added the description of the above-mentioned result in the revised manuscript (page 18, lines 313-322). And we have shown the data of growth curves of the original 9 subject strains in the new Fig 7. And we have added the data of the growth curves of M005 and M016 in new Supplementary Fig 7. Additionally, we have corrected the label of y-axis in new Fig. 7a and new Supplementary Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Figure legends. (page 58, lines 1027-1028 and Supplementary Fig. 7a legend)

      (5) Lack of statistical statement: The authors emphasized the role of pellicle-formation-associated genes in strain-dependent essential and accessory essential genes. Additionally, the authors observed that 10% of the genes required for mouse infection are also required for hypoxic pellicle formation. However, these are merely descriptive statements. There is no enrichment analysis to justify whether pellicle-formation-associated genes are significantly enriched in these groups.

      Thank you for the comments on the enrichment of pellicle-formation associated genes in strain-dependent essential and accessory essential genes. We performed GSEA and found that 9.1% (16 out of 175) genes were hit as core enrichment. Of them, 4 genes were hit commonly as genes showing increased genetic requirements analyzed by resampling plus HMM analyses including genes of phosphoenolpyruvate carboxykinase pckA (OCU_RS48660), type VII secretion-associated serine protease mycP5 (OCU_RS38275), type VII secretion protein eccC5 (OCU_RS38345) and glycine cleavage system amino-methyltransferase gcvT (OCU_RS35955).

      Author response image 2.

      We have added the description of GSEA result in the revised manuscript (page 8, lines 137-144; Supplementary Fig. 2; Supplementary Table 5).

      Reviewer #1 (Recommendations For The Authors):

      Tn-seq and hypoxia adaption in clinical isolates of M. intracellulare (Mi): The authors claim that clinical strains are better adapted to hypoxia because their genetic requirements for optimum fitness overlap with genetic requirements for fitness of the type strain under hypoxia. This is a reasonable hypothesis, but it has not been well-supported by the data presented in Figure 7. The growth rates (Figure 7b) of most of the clinical strains under hypoxia appear to be less than the type strain, although they all seem to grow better than the type strain under normoxia. Perhaps a continuous growth curve of each strain, both as pure and mixed cultures under these conditions will provide a clearer picture.

      Thank you for the comments on the difference of adaptation of hypoxic growth between ATCC13950 and MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of cultures for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model, and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      In vivo studies: It is unclear how virulent the two clinical strains, Mi27 and Mi198 are in the mouse model. The CFU data in Figure S1b reports the bacterial burden of the Tn libraries of the two strains, of which the overall population of Mi27 library seems to be declining. Without any information on the CFU, animal survival, and tissue pathology from the pure strains, data from the library will have limited implications.

      Thank you for the comments on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref#22). Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication on the virulence of MAC-PD pure strains used in this study for mice in the revised manuscript (page 12, line 212; new Ref #22).

      The data of CFU counts were shown in new Supplementary Figure 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively (Tateishi Y. BMC Microbiol. 2023; new Ref#22), except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (new Supplementary Fig, 3b, new Supplementary Table 8)”. The decline of overall population of M.i.27 Tn mutant library strains in the infected lungs can be explained by the lower virulence of M.i.27 pure strain that shows intermediate virulence phenotype than M.i.198 that shows high virulence phenotype.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Data presentation: The manuscript suffers from a lack of clarity in data visualization and presentation, especially the Tn-Seq datasets. Panels describe the experimental workflow with a densely-worded paragraph, making it difficult to navigate through the major findings.

      Thank you for the comments on the issue of Fig. 1b. Following the suggestion, we have modified the new Fig. 1b entitled “Strategy of the procedure of TnSeq analyses”.

      Language: The paper should be extensively revised for language. Often the authors have mixed-up the terms like 'core' and 'accessory' 'genes' in lines 116-119 with 'core and accessory genomes' in Figure 2c, which is not even mentioned in the paper. It is further unclear how they identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of Mi. Line 251: ..."involved by confer..." needs revision. The terms "increased gene essentiality" and 'growth-defected associated genes" are very confusing. The essentiality of a gene is either absolute or conditional but is not quantitative. Similarly, 'growth-defect associated' can be replaced with a better phrase that alludes to fitness loss in the clone. Additional typos were found throughout the paper that need to be fixed.

      Thank you for the comments on the issue of scientific words including “'core and accessory genomes” and “gene essentiality” used in this study.

      Based on Rosconi’s paper (Panel C of Fig. 1 in Nat Microbiol. 2022; new Ref#14), we used the phrases “accessory genome and core genome” as a meaning of a whole set of genes belonging to accessory and core genes. To avoid the confusion and keep consistency, we replaced the term “genomes” to “genes” in the revised manuscript.

      In our previous comparative genomic analysis, we identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of M. intracellulare (Tateishi Y. BMC Microbiol. 2021; new Ref #5). To perform pangenomic analysis, we used the software Bacterial Pan-Genome Analysis tool (BGAP) (Narendrakumar NM. Sci Rep 2016).

      We admit that gene essentiality is a qualitative but not a quantitative trait. We have corrected the term "increased gene essentiality" as "increased genetic requirements" throughout the manuscript.

      We have used the term “growth-defect (GD)” based on the classification of gene essentiality calculated by the Hidden Markov Model (HMM) complied by TRANSIT software (DeJesus. PLoS Comput Biol. 2015; new Ref#12). The HMM classifies genes as essential (ES), GD, non-essential (NE), growth-advantage (GA). GD means difficulties of growth (growth deficiency) in aerobic conditions in vitro, because Tn insertions are less frequent. The suggested phrases “fitness loss” or “less fit” may include the meaning of the comparison of two different conditions such as culture conditions exposed to a single bacterial strain. Since the HMM analysis is performed in data of a single strain in a specific bacterial condition, we consider that the phrase including “fitness” is somewhat unsuitable for describing the classification of gene essentiality. Thus, it is difficult for us to rephrase GD to the word that implies fitness levels between two conditions in a single bacterial strain.

      [References]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Narendrakumar NM et al. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep 2016 6, 24373 (2016).

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      (1) Result 1 (Page 6-7): Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains.

      (1a) From Table S1, it is observed that the numbers of Tn-inserted TA sites significantly vary (p >0.05) among biological replicates for each strain when compared with the reference strain ATCC13950. the authors should provide an explanation of how they overcame this variation in their analysis.

      Thank you for the comment on the issue of a variable number of Tn-inserted TA sites among biological replicates for each strain of MAC-PD.

      On TRANSIT software, we set the replicate option as Sum to combine read counts. And we used Beta-Geometric correction (BGC) to normalize the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ.

      Following the comment, we have added the description on which option we used for handling the replicate data and normalization (page 36, lines 640-643).

      (1b) Importantly, saturation in most of the strains is only ~50-60%. In such a case, there will be a high probability that Tn will not hit a nonessential region due to chance instead of selection (See DeJasus et al., mBio, 2017). It has been observed that the sequence pattern (GC)GNTANC(GC) is strongly associated with non-permissiveness. As shown earlier, the authors need to carefully look for the potential non-permissive sites before concluding the fate of a gene. Also, they should acknowledge the potential limitations of their approach due to the suboptimal level of saturation.

      Thank you for the comment on the saturation of Tn mutant libraries. Our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey. PLoS Pathog 2018; new Ref#8). Our method is also based on the coauthor’s TnSeq study on H37Rv (Minato Y. mSystems 2019; new Ref#61). Moreover, by combining replicates, the saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As the Reviewer indicates, there is non-permissive sequence pattern in mariner transposon mutagenesis. Using more than 10 replicates of Tn mutant libraries is quite an accurate method for detecting essential genes in nonstructural small genes such as small regulatory RNAs. However, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017; new Ref#57). Thus, we do not consider that we made a serious mistake for the classification of essentiality in most of the structural genes that encode proteins. With respect to the coverage of non-permissive sites, our TnSeq method might need to be improved if it is intended to classify the gene essentiality quite accurately on the small genes including small regulatory RNAs.

      We investigated the non-permissive TA sites in ATCC13950. There are 4136 (6.43% of total ORFs) nonpermissive TA sites in ATCC13950, which is less than in H37Rv (9% of total ORFs) (DeJesus MA. mBio 20171; new Ref#57) and in M. abscessus ATCC19977 (8.1% of total ORFs)(Rifat D. mBio. 2021; new Ref#58). As for larger ORFs (TA sites > = 10), there are nonpermissive TA sites in 89 genes (ORFs) of common “essential (ES)” or “growth-defect-associated (GD)” (4.82% of a total of 1844 larger ORFs in ATCC13950). As for small ORFs (2-9 TA sites), there are nonpermissive TA sites in 41 genes (ORFs) of common ES or GD (1.35% of a total of 3021 smaller ORFs in ATCC13950).

      We appreciate the idea of concluding the fate of gene essentiality by the presence/absence of non-permissive TA sites. However, we cannot conclude the fate of gene essentiality classification only by the presence/absence of potential non-permissive sites. Because, strictly to say, it is impossible to conclude the scientific truth of gene essentiality without functional analysis using gene manipulation. In accurate, TnSeq can “predict” the gene essentiality but cannot perfectly guarantee the functional significance. However, in the current situation, most of the recent TnSeq studies have been published only by the TnSeq analysis without functional analysis that uses gene manipulation strains of all targets they identified. Taking such limitations of TnSeq including non-permissive sites into consideration, we consider that the essentiality of the detected genes should be determined in further studies, mainly including biological experiments such as functional studies using gene manipulation strains.

      We have added the above-mentioned contents in the revised manuscript (pages 32-33, lines 559-580).

      [References]

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Minato, Y., et al. Genomewide assessment of Mycobacterium tuberculosis conditionally essential metabolic pathways. mSystems. 4, e00070-192019 (2019).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (1c) Line 100: Authors report a total of 131 genes identified as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. It should be explained in more detail how gene essentiality was determined (see above comment in (1b)). Furthermore, in Table S3 authors should mention the essential and growth defective trait of each of the 131 genes.

      Thank you for the comment on how to classify the 131 genes as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. As replied in (1b), the average saturation of Tn insertion of our libraries became 62-79% when combining duplicate or triplicate data in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq analysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) were used for HMM and resampling analyses, and most of triplicate libraries ranges 70-79% saturation (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to those with DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq libraries are acceptable for identifying essential genes and growth-defect-associated genes by the HMM method.

      We used the HMM method as reported by DeJesus (DeJesus. PLoS Comput Biol. 2015; new Ref#12). HMM method can categorize the gene essentiality throughout the genome including “Essential”, “Growth-defect”, “Non-essential” and “Growth-advantage”. “Essential” genes are defined as no insertions in all or most of their TA sites. “Non-essential” genes are defined as regions that have usual read counts. “Growth-defect” genes are defined as regions that have unusually low read counts. “Growth-advantage” genes are defined as regions that have unusually high low read counts.

      Following the previous report (Carey AF. PLos Pathog 2018; new Ref#8), the annotation for the clinical MAC-PD strains was adapted from that of ATCC13950 by adjusting the START and END coordinates of each ORF in the clinical MAC-PD strains according to their alignment with the corresponding ORFs of ATCC13950. By using an adjusted annotation table, gene essentiality was classified by the HMM analysis.

      We have added the explanation of how we identified essential and growth-defect-associated genes in the Methods (pages 35-36, lines 620-632). And following the comment, we have added the data of classification of gene essentiality in the 131 genes in the new Supplementary Table 3 in the revised manuscript.

      [Reference]

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      (1d) In Table S4, the authors show strain-specific putative essential genes from the core and accessory gene sets. For the sake of clarity, it is important to have the name of all the strains against each gene in which it is predicted essential or growth defective.

      Thank you for the comment on the hit strains on the genes classified as strain-specific and accessory putative essential of growth-defect associated. Following the comment, we have added the data of hit strains in the new Supplementary Table 4 in the revised manuscript.

      (1e) Lines 123-126: It is not clear what is the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. These appear to be randomly distributed across different clinical isolates and is not clear whether they correlate with differential susceptibility of the reference strain and clinical isolates to hypoxia.

      Thank you for the comment on the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways. We indeed found that the genetic requirements of several hypoxic pellicle genes were increased in clinical MAC-PD strains in vitro situations. These data suggest the hypoxic pellicle genes become more important in clinical MAC-PD strains for in vitro growth than in ATCC13950.

      Moreover, hypoxia is known to be one of the characteristic conditions in vivo including clinical lesions (McKeown. Br Br J Radiol. 2014). We consider it reasonable to expect that the strains derived from MAC-PD patients without predisposing immunological disorders may adapt under hypoxic conditions for maintaining bacterial survival. Therefore, we highlighted the genes involved in hypoxic pellicle formation in ATCC13950.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).<br /> [Reference]

      McKeown, S.R., et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87,: 20130676 (2014).

      (2) Result 2 (pages 8-10): Genes with increased gene essentiality in clinical MAC-PD strains are also required for hypoxic pellicle formation in the type strain.

      (2a) As reported by authors (lines 123-126), only a small fraction of genes showing essentiality in clinical MAC-PD strains are required for hypoxic pellicle formation in the reference strain, which might be due to random distribution. Authors should avoid making such a generalised statement that reflects the association of the entire essential gene pool in clinical MAC-PD strains with hypoxic pellicle formation.

      Thank you for the comment on the issue of a small fraction of genes showing increased genetic requirements in clinical MAC-PD strains that is shared with genes required for hypoxic pellicle formation in the type strain ATCC13950. We admit that the section title may mislead that the genes required for hypoxic pellicle formation confer the entire essential gene pool of clinical MAC-PD strains. Following the comment, we have revised the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950” (page 9, lines 146-147).

      We consider that it cannot be explained by a mere coincidence that we obtained the data of partial overlap of genes showing essentiality in clinical MAC-PD strains with genes required for hypoxic pellicle formation in ATCC13950, because we demonstrated the supporting data such as the pattern of genetic requirements suggesting gluconeogenic metabolic shift (Fig. 5) and the different pattern of hypoxic growth curves between clinical MAC-PD strains and ATCC13950 (Fig. 7).

      (2b) I fail to understand how the number of Tn insertions determines "more" or "less" essentiality of a gene particularly with 50-60% saturation. To my understanding, essentiality is a qualitative trait. Either a gene will be essential (based on no Tn insertion despite having the permissive sites), critical (poor representation of Tn insertions at the permissive sites due to growth defect of the strain in the pool), non-essential (expected frequency of insertion) or growth-advantageous (higher representation of Tn insertions at the permissive sites due to growth advantage of the strain in the pool). Hence, authors should avoid quantifying the essentiality of a gene.

      Thank you for the comments on the trait of gene essentiality. We realize that essentiality is a qualitative trait, not a quantitative trait. Taking into consideration the number of Tn insertions determines "more" or "less" requirements of a gene, we have corrected the manuscript by using the phrase “genetic requirements” instead of “gene essentiality”.

      As mentioned earlier, our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey AF. PLoS Pathog 2018; new Ref#8). Moreover, as described in rebuttal (1b), the saturation of our Tn mutant libraries by combining replicates are 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the recent TnSeq analysis by Akusobi where 52-80% saturation libraries (“high-density” transposon libraries) were used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi C. mBio. 2025; new Ref#9).

      Thus, we consider that our data of the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 are acceptable.

      [Reference]

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      (2c) From Figures 3-4, it seems the authors intend to highlight the insertion frequencies of certain genes in the clinical isolates compared to those in the reference strain to conclude whether a gene has become more critical and its disruption results in the growth defective phenotype (poor representation) in the clinical isolates, or a critical/essential gene has become dispensable in these strains.

      Based on these arguments, I suggest that the authors modify the title of the result such as "Tn insertion reveals differential requirement of genes for in vitro growth of clinical MAC-PD strains" or "Identification of genes differentially required for in vitro growth of clinical MAC-PD strains" as this is precisely the information we gain from this section of the study. Also, it is suggested to re-draft the rationale of this section as only 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. Hypoxia-related genes can be highlighted in the next section (see below).

      Thank you for the suggestion to modify the section title and to re-draft the rationale of the section. Following the comment, we modified the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950 (page 9, lines 146-147)

      Following the suggestion, we have revised the rationale of this section as follows: “The sharing of strain-dependent and accessory essential or growth-defect-associated genes with genes required for hypoxic pellicle formation in ATCC13950 prompts us to consider that the profiles of gene essentiality in clinical MAC-PD strains may be associated with the genes required for hypoxic pellicle formation in ATCC13950.” (page 9, lines 151-155)

      The reviewer points out that only 4 genes associated with hypoxic pellicle formation were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. However, to discuss how much proportion of the genes were detected to be increasingly required in clinical MAC-PD strains compared to ATCC13950, we should focus on the 121 genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950, excluding the 162 genes indispensable for clinical MAC-PD strains. Thus, we described that 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of the 121 genes having significantly fewer Tn insertions than ATCC13950 in the manuscript (Fig. 3).

      (3) Result 3 (Page 10-14): Requirement of genes with increased gene essentiality in the clinical MAC-PD strains for mouse lung infection.

      (3a) The title should be modified to "Identification of genes in the clinical MAC-PD strains required for mouse lung infection".

      Following the comment, we have modified the section title as "Identification of genes in the clinical MAC-PD strains required for mouse lung infection". (page 12, lines 201-202).

      (3b) Further, the rationale of this experiment needs to be modified. As mentioned above, up until now the impact of hypoxic pellicle formation genes in the growth of MAC-PD strains remains unconvincing. The rationale of mouse infection experiments could be straightforward- "to identify genes critical for animal infection of the clinical isolates".

      Thank you for the comment on the rationale of the in vivo TnSeq experiment. Following the comment, we have revised the rationale as “The impact of hypoxia on mycobacteria under various ecological circumstances implies that the genes required for pathogenesis of MAC-PD may be in some degrees, overlapped with the genes with increased requirements in the clinical MAC-PD strains compared to ATCC13950, and also with the genes required for hypoxic pellicle formation in ATCC13950. To identify genes required for in vivo infection of clinical MAC-PD strains,” in the revised manuscript (page 12, lines 204-210).

      (3c) The authors should avoid using the term "genes with increased essentiality" for the reasons explained above in point #2b.

      Following the comment, we have corrected the term as “genes with increased requirements” in the revised manuscript (page 12, line 207).

      (3d) From Tables S8 and S9, I can find 93 genes in Mi198Tn and 74 genes in Mi27Tn for which Tn insertion mutants are under-represented in TnSeq at all time points from Day 1 to Wk 16 in comparison to input. Importantly, excluding results from Day 1 when the infection has just settled, I find 172 and 121 genes in Mi198Tn and Mi27Tn, respectively, under-represented in lungs between Wk 4-16. My suggestion is that authors should focus more on such genes and identify the characteristics of these genes and what fraction belongs to those involved in hypoxic pellicle formation in the reference strain. I am perplexed why authors have categorically ignored other genes and only focused on a set of genes that correspond to ~10-12% of entire differentially abundant mutant pool.

      Thank you for the suggestion on the genes that Tn insertion mutants are under-represented in TnSeq from Weeks 4-16 in the infected mouse lungs be analyzed for overlapping the genes involved in hypoxic pellicle formation in the type strain ATCC13950. We found that at all timepoints from Day1 to Week 16, 74 genes and 99 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (28.3%) and 12 (12.1%) genes belonged to the genes involved in the genes required for hypoxic pellicle formation in the type strain. We found that at timepoints from Week 4 to Week 16, 121 genes and 172 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (23.1%) and 30 (18.0%) genes belonged to genes involved in hypoxic pellicle formation in the type strain. These hypoxic pellicle-associated genes detected both in M.i.27 and M.i.198 encoded methionine synthesis, acyl-CoA dehydrogenase, isocitrate lyase, MMPL family transporter at all time points (from Day1 to Week 16). And additionally, multifunctional oxoglutarate decarboxylase/dehydrogenase, proteasome subunits, ABC transporter ATP-binding protein/permease, lipase chaperone at all time points (from Week 4 to Week 16). We have described these results in the Result section (page 14 lines 236-248) and new Supplementary Tables 12 and 13.

      As for M. intracellulare, conditionally essential genes have not been revealed except for those required for hypoxic pellicle formation in ATCC13950 revealed by us (Tateishi Y. Sci Rep. 2020; new Ref#10). This study is the first to focus on the relationship between the difference of genetic requirements among strains and hypoxic adaptation. We found a certain proportion of overlapped genes required for mouse lung infection and ATCC13950’s hypoxic pellicle formation. We consider it reasonable to focus on the category of genes required for hypoxic pellicle formation to analyze the datasets of TnSeq in mice.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (3e) Page 13, lines 224-227: "Despite the differences in the profiles of the genes required for in vivo infection between strains, these data suggest that increased gene essentiality for hypoxic growth confers advantages for pathogenesis in vivo."

      For the reason described above, I find it a misleading hypothesis that hypoxic growth confers advantages for pathogenesis in vivo. How come only 10-12% of the entire gene sets which include genes of varying functions, can be the sole contributors to bacterial survival in host organelles during infection?

      More importantly, the mouse is not considered a good model for hypoxia as mouse infection does not lead to the formation of solid granuloma with a hypoxic core Though I am not convinced with the authors' bias toward hypoxia-related genes, however, if at all they aim to investigate the role of such genes by an unbiased enrichment of TnSeq mutant, they should have used C3HeJ mice which are known to form granulomas (Boute et al., 2017 (doi: 10.1186/s13567-017-0477-7)).

      Thank you for the comments on the issue of the contribution of genes required for hypoxic growth and on the difference of hypoxic levels between mouse lineages. We did not intend to mention that a set of genes required for hypoxic growth is the sole contributor to bacterial survival in host organs during infection. As we discussed in the Discussion section, we acknowledge that the adaptation to the difference of carbon source between in vitr_o and _in vivo infection (i.e. preferential usage of lipid carbon source in vivo) is involved in the pathogenesis of mycobacterial diseases (Yang. Front Microbiol 2018; new Ref#33, Gouzy. Proc Natl Acad Sci U S A 2021; new Ref#29, Quinonez. mBio 2022; new Ref#40, Pandey. Proc Natl Acad Sci U S A 2008; new Ref#41). We consider that not only the genes required for hypoxic pellicle formation but also strain-dependent/accessory genes conferring kinds of metabolism other than hypoxic pellicle formation can be estimated to be involved in the in vivo mouse lung infection.

      We have modified the sentence to clearly express our intention as follows: “These in vivo TnSeq data suggest that, despite the differences in the profiles of the genes required for in vivo infection between strains, increase of genetic requirements for hypoxic growth in part contribute to the pathogenesis in vivo.” (pages 15-16, lines 269-271)

      It seems to be an interesting idea to perform TnSeq by using C3HeJ mice. The granuloma formed in C3HeJ mice becomes extremely hypoxic (less than 1%, corresponding the level of “pathological” hypoxia) which is as severe as the detection range by pimonidazole. In our model, the effect of such pathological levels of hypoxia on granuloma formation might not be detected. However, the lesion formed in C57BL/6 mice becomes a “physiological” level of hypoxia (5% O2) (McKeown SR. Br Br J Radiol. 2014) which is the same O2 level for M. intracellulare to form pellicles. In principle, oxygen levels inside human bodies are physiologically hypoxic, and many biological events are experimentally investigated in this condition. Thus, we consider that we were able to observe the effect of physiological hypoxia on M. intracellulre growth both in vitro (hypoxic pellicles) and in vivo (infected C57BL/6 mice).

      [Reference]

      Yang, T. et al. Pan-genomic study of Mycobacterium tuberculosis reflecting the primary/secondary genes, generality/individuality, and the interconversion through copy number variations. Front Microbiol 9, 1886 (2018).

      Gouzy, A., Healy, C., Black, K.A., Rhee, K.Y. & Ehrt, S. Growth of Mycobacterium tuberculosis at acidic pH depends on lipid assimilation and is accompanied by reduced GAPDH activity. Proc Natl Acad Sci U S A 118, e2024571118 (2021).

      Quinonez, C.G. et al. The role of fatty acid metabolism in drug tolerance of Mycobacterium tuberculosis. mBio 13, e0355921 (2022).

      Pandey, A.K. & Sassetti, C.M. Mycobacterial persistence requires the utilization of host cholesterol. Proc Natl Acad Sci U S A 105, 4376-4380 (2008).

      McKeown., S.R. et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87, 20130676 (2014).

      (3f) An important set of data with the ATCC13950 reference strain is missing here. It is suggested that authors perform this study with the reference strain to identify whether the enrichment of genes is similar across all strains or is specific to the clinical isolates.

      Thank you for the comment on the setting of ATCC13950 as a control strain in the mouse infection experiment. However, we proved that bacterial burden of ATCC13950 is reduced continuously from 4 weeks of infection, and that ATCC13950 is almost completely eliminated from 8 to 16 weeks of infection (BMC Microiol 2023; new Ref#22). Therefore, it is impossible to perform TnSeq to detect the genes required for persistent infection in mice infected with ATCC13950.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3g) Pages 13-14, lines 228-245: "We have performed a statistical enrichment analysis of gene sets by GSEA...".

      The comparison made here is not clear to me. It seems the authors do compare genes required for the growth of M.i.27 and M.i.198 in mouse lungs with the gene sets required for hypoxic pellicle formation in ATCC13950 together with the gene sets showing increased gene essentiality observed in the clinical MAC-PD strains, and claim that a significant % of genes belong to hypoxia-adaptation pathways. It is factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains. It is suggested that authors re-analyze their data by comparing genes required for the growth of M.i.27 and M.i.198 in mouse lungs individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth, and present accordingly.

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and through Week4 to Week16 of infection) and 40% (70 and 79 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      The tables and graphs of GSEA results are shown in Supplementary Figs. 5, 6.

      These data indicate that 40-50% of the genes required for in vitro hypoxic pellicle formation and the strain-dependent/accessory essential genes are significantly enriched individually with in vivo bacterial growth. We have added the result of reanalyzed data of GSEA in the Result (pages 16-17, lines 287-290). We have shown the detail of reanalyzed data of GSEA in Supplementary Figs. 5, 6 and Supplementary Tables 15, 16.

      (3h) Since authors have used Tnseq of pooled mutants, which often yields misleading information, it is important to validate some of their findings upon mouse infection with individual mutants that yield prominent as well as baseline reduction at different time points. In the absence of validation, it remains a mere speculation of the role of these genes in the infection of these strains to animals.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi Y. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL+. Microbiol Immunol 68, 339-347 (2024).

      (4) Result 4 (Page 14-15): Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics.

      (4a) "The metabolic remodeling, such as the increased gene essentiality of gluconeogenesis and the type VII secretion system..". As stated above, the essentiality of a gene, being a qualitative trait, should not be presented in quantitative terms. The authors should re-phrase this statement.

      Following the comment, we have corrected the term as “The metabolic remodeling, such as the increased genetic requirements of gluconeogenesis and the type VII secretion system.” (page 17, lines 296-297)

      (4b) "overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by conferring these metabolic pathways..". There is a syntax error in this statement and needs revision.

      Following the comment, we have corrected the phrase as “overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by these metabolic pathways”. (page 17, lines 297-299)

      (4c) The altered requirement of genes in different clinical strains for survival provides only circumstantial evidence of metabolic remodeling. Authors are suggested to perform metabolic profiling of representative clinical and reference strains, as it is important to examine whether these bacteria indeed undergo metabolic shift.

      Thank you for the comment on the metabolic profiling of the representative clinical and reference strains. We previously published the TnSeq result of ATCC13950 and we produced the current data by organizing with our previous findings (Fig. 4 in Tateishi Y. Sci Rep 2020; new Ref#10). The priority of the current study was to elucidate the difference and diversity of genetic requirements between clinical MAC-PD strains and ATCC13950. We consider that it is of some value to show the even circumstantial evidence of metabolic remodeling by TnSeq, because it provides a strong rationale for proceeding to the next study including metabolomic analysis.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (5) Result 5 (Page 16-18): Effects of knockdown of universal and accessory/strain-dependent essential or growth-defect-associate genes in clinical MAC-PD strains.

      (5a) Lines 273-277: The rationale of using CRISPRi should be correctly presented to evaluate the effect of individual genes' suppression on the downstream phenotype and not to establish the CRISPRi silencing tool in MAC.

      Thank you for the comment on the rationale of the section of the CRISPR-i experiment. Following the comment, we have modified the sentence as follows: “With an intention to evaluate the effect of suppressing TnSeq-hit genes on bacterial growth.” (page 19, lines 333-334 in the revised manuscript).

      (5b) Line 278: pRH2052/pRH2521 are the plasmids and not the CRISPRi system.

      Following the comment, we have corrected the phrase as “pRH2052/pRH2521 clustered regularly interspaced short palindromic repeats interference (CRISPR-i) plasmids.” (page 19, lines 334-335 in the revised manuscript).

      (5c) Line 280: Other pioneering studies on the use of CRISPRi for gene silencing in mycobacteria (Chaudhary et al., Nat Comm, Rock et al., Nat Microbio) should also be cited.

      Thank you for the comment on adding the reference papers on CRISPR-i in mycobacteria. We have added the two suggested papers in the revised manuscript as new Ref #30 and #31. (page 19, line 336).

      (5d) Lines 282-283: It is not clear why M001 and MOTT64 could not be transformed. Did the authors use any control plasmid to evaluate the transformation efficiency of these strains?

      Thank you for the comment on the failure of transformation in M001 and MOTT64.

      Following the comment, we have performed the experiment for evaluating the efficiency of transformation in the 9 M. intracellulare strains we used in this study. We have used an E. coli-mycobacteria shuttle vector pSO246KM-Prhsp65-luc that expresses firefly luciferase as a control plasmid (Aoki K. J Biol Chem 2004). For obtaining transformed colonies, we used 7H10/OADC agar plates containing the same concentration of kanamycin that we used for preparing Tn mutant libraries and for obtaining CRSISPR-i knockdown strains.

      We have observed no colonies grown on agar plates in MOTT64 after electroporation of the pSO246KM-Prhsp65-luc plasmid. In most of the remaining strains, the transformed colonies have emerged fully on day 10 of culture after electroporation of the plasmid. However, we have observed that M001 needs twice as long as a period for the emergence of transformed colonies. On day 21, the number of colonies in M 001 have finally become comparable to that of the other strains. We have checked the luciferase activity of 6-12 colonies in each strain except for MOTT64, and we have confirmed the transformation of the plasmid by the data of higher luciferase activity in the colonies undergoing electroporation of the plasmid than in those not undergoing electroporation.

      The possible reason for the incapability of obtaining transformants of CRISPR-i vectors in MOTT64 may be due to the extremely low efficiency of acquiring foreign DNA. And the possible reason for the incapability of obtaining transformants of CRISPR-i vectors in M001 may be intolerable to the stress caused by transformation of plasmids compared to other M. intracellulare strains. For M001, pSO246KM-Prhsp65-luc plasmid may cause tolerable stress for transformation, resulting in the delayed emergence of transformed colonies. By contrast, the CRIPSR-i plasmids may cause greater stress for M001 than pSO246KM-Prhsp65-luc plasmid, resulting in being intolerable for transformation.

      Author response table 1.

      Author response image 3.

      Result of luciferase activities before and after transformation of pS0246KM-Prhsp65-luc plasmid. Fifty microliter of cultures were mixed with 50 u L of assay reagents (Luciferase assay system E1500, Promega) and luciferase activity was measured by the luminometer (FilterMax F5, Molecular Devices). Data are shown as mean ± SD of 6-12 colonies

      [Reference]

      Aoki K. Extracellular mycobacterial DNA-binding protein 1 participates in Mycobacterium-lung epithelial cell interaction through hyaluronic acid. J Biol Chem 279, 39798–39806 (2004).

      (5e) Lines 283-186: "To confirm the gene essentiality detected with the HMM analysis, we evaluated the consequent growth inhibition in the knockdown strains of representative universal essential or growth-defect-associated genes, including glcB, inhA, gyrB, and embB.." It is not clear what was the level of suppression of these genes in the respective KD strains. Authors should include the level of suppression of these genes also by qRT-PCR.

      Thank you for the comment on the suppression levels of gene expression in knockdown strains of universal essential genes. Following the comment, we have evaluated them by qRT-PCR and we observed comparable levels of knockdown efficiency in the knockdown strains between universally essential genes and strain-specific/accessory essential genes (new Supplementary Fig. 9). Overall, the gene expression was suppressed to 20 - 70% in the knockdown strains compared to the vector control strains that do not express sgRNA.

      We have added the data of qRT-PCR of knockdown strains of universal essential genes such as glcB, inhA, gyrB, and embB (new Supplementary Fig. 9). We have revised the Result and Discussion in the manuscript (page 21, lines 367-376; page28, lines 490-497).

      (5f) Lines 293-: I am unable to establish any correlation between the growth of the knockdown with Tn insertion reads in the respective genes. For instance, pckA exhibits reduced Tn insertion reads in almost all the strains except in M.i.27, but the effect of its KD on growth is seen only in M.i.198 and M003; glpX exhibits reduced Tn insertion reads in M003, M019, M021 but the effect of its KD on growth is seen only in M003; csd exhibits reduced Tn insertion reads in M.i.198, M003, M019 but the effect of its KD on growth is seen only in M.i.198 and M003. The authors argue that these contradictory phenotypes are due to difficulties in the effective operation of genetically modified systems using foreign genes from different bacterial species in MAC-PD strains (Lines 310-312) or the desired effect on growth could not be observed due to the inability of CRISPRi to yield >99% suppression (Line 314) are not the valid justifications. Indeed, a close look at the RT-PCR data (Figure S5) reveals that pckA levels are ~0.22, 0.5, 0.2, 0.22, 0.2, 0.5, and 0.3 fold relative to sigA in M.i.198, M.i.27, ATCC13950, M018, M019, M003 and M021, respectively, but the effect of its suppression on growth by CRISPRi is seen only in M.i.198 and M003. Secondly, >99% suppression is not a universal prerequisite for all the genes to show growth defect (as might be the case with glcB, inhA, gyrB, and embB genes in this study). Hence, it remains unclear why contrasting results are obtained for most of the genes by TnSeq and CRISPRi.

      Thank you for the comments on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results of universal essential genes. By obtaining the data of suppression levels of gene expression in the knockdown strains of universal essential genes, we have acknowledged that the low efficiency of knockdown does not explain the reason of the discrepancy between TnSeq and CRISPR-i results because the levels of knockdown efficiency were comparable between strain-dependent/accessory essential genes and universally essential genes.  

      Although the mechanism has not been fully proven yet only from the current study, we consider that such inconsistent phenotypes with TnSeq and CRISPR-i based knockdown may be related to the recently revealed the bypass mechanism of gene essentiality which is characteristically observed in strain-dependent/accessory essential or growth-defect-associated genes. According to the publication by Rosconi (Nat Microbiol. 2022: new Ref#14) reporting the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains, gene essentiality can be bypassed by several mechanisms including the composition of the accessory genome and pathway rewiring. They recovered successfully knockout mutants from transformation experiments in strain-specific/accessory essential genes such as cytidine monophosphate kinase, a folate pathway enzyme formate tetrahydrofolate ligase and an undecaprenyl phosphate-biosynthesis pathway enzyme farnesyl-diphosphate synthase. The bypassing of gene essentiality could be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes were reported to fulfill the three categories including high levels of conservation within and often across species, limited genetic diversity, and high and stable expression levels. Consequently, universal essential genes are estimated to be rigid, largely immutable key components to an organism’s survival.

      We consider that this is the case with our study on NTM because NTM is pangenomic. The knockdown of universal essential genes resulted in the clear growth suppression; however, the knockdown of strain-dependent/accessory essential genes did not show the consistent growth suppression. We consider that the bypass mechanism of gene essentiality can explain the inconsistent effect of gene silencing of strain-dependent/accessory genes on bacterial growth suppression.

      We have added the above-mentioned description in the Discussion (pages 28-29, lines 497-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Minor Comments:

      (1) The authors should mention the cut-off of fold-change for all the experiments in the methods section.

      Thank you for the comment on the cut-off of fold-change. We set the cut-off of fold-change as adjusted P-value < 0.05. We added the description in the Methods section. (page 41, lines 724-725)

      (2) Figure 7 legend (Lines 888-889): "Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown."

      Figure S3 legend: Data on the growth curves are the means of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown.

      Figure S4 legend: Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      Figure S5 legend: Gene expression data are the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      These statements need clarification. Whether multiple independent experiments (biological repeats), each with 2-3 technical replicates performed and the data shown represent one of the multiple biological repeats?

      Thank you for the comments on the number of experiments performed and the number of replicates. We have performed two or three independent experiments with 2-3 technical replicates. The data shown represent one of the independent experiments.

      (3) Figure 7b: Statistics are missing in the bar graph for growth rate under aerobic conditions.

      Thank you for the comment on the statistics of the data regarding growth rate under aerobic conditions. We have added the statistics in the new Fig. 7c.

      (4) The authors should check the y-axis in Figure 7b, as it is not clear whether bacteria indeed show a growth rate of 1-3 CFUs/day.

      Thank you for the comment on the y-axis in Figure 7b. We have corrected the label of y-axis as “log10[CFUs]/day” in the new Fig. 7c. Additionally, we have corrected the label of y-axis in new Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Fig. 7a legend.

      Reviewer #3 (Recommendations For The Authors):

      (1) It's notable that strains M001 and MOTT64 failed to undergo a transformation, while seven other strains did. Given that M001, MOTT64, and M019 belong to the same phylogenetic clade, it raises questions about why particular strains within this clade showed different transformation outcomes. It might be valuable for them to discuss this discrepancy in their study.

      Thank you for the comment on the difference in capacity of transformation between strains belonging to the same genomic subgroup. Although the direct mechanism determining the competency for foreign DNA has not been elucidated in M. intracellulare and other pathogenic NTM species, several studies on general bacteria suggest the difficulties of introducing foreign DNA into clinical strains compared to the laboratory strains. As suggested in Staphylococcus aureus (Covaglia AR. PNAS. 2010; new ref#55), some clinical strains develop elimination system of foreign nucleic acids such as a type III-like restriction restriction endonuclease. As suggested in gran-negative bacteria (Qin J. Sci Rep. 2022; new Ref#56), there may be some difference in cell surface structures between strains, resulting in the necessity of polymyxin B nonapeptide targeting cell membrane for transforming clinical strains. The efficiency of eliminating foreign DNA may be attributed to various kinds of strain-specific factors including restriction endonuclease, natural CRISPR-interference system and cell wall structures rather than a simple genotypic factor.

      We have added the description on the difference of capability in transformation in the Discussion. (page 31, lines 546-558)

      [References]

      Corvaglia, A.R., François, P., Hernandez, D., Perron, K., Linder, P. & Schrenzel, J. A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proc Natl Acad Sci U S A 107, 11954-11958 (2010).

      Qin, J., Hong, Y., Pullela, K., Morona, R., Henderson, I.R. & Totsika, M. A method for increasing electroporation competence of Gram-negative clinical isolates by polymyxin B nonapeptide. Sci Rep 12,:11629 (2022).

      (2) The authors should consider specifying M. intracellulare in their title.

      Thank you for the comment on the manuscript title. Following the comments from all Reviewers, we have modified the title as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

    1. eLife Assessment

      This important study provides evidence supporting the idea that postnatal experience plays an instructive role in shaping the patterns of functional connectivity between extrastriate visual cortex and frontal regions during development, by comparing neonates, blind and sighted adults. The evidence supporting the authors' claim is solid. Nevertheless, substantial weaknesses remain in mechanistic interpretation and alignment with relevant developmental frameworks. This study will be of significant interest to neuroscientists and neuroimaging researchers focused on vision, plasticity and development.

    2. Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

    3. Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

    4. Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. eLife Assessment

      This important study highlights the novel role of RSPO mimetic SZN-043 in the activation of hepatic WNT signaling and promoting hepatocyte regeneration. The authors provide convincing evidence of SZN-043 increasing hepatocytes proliferation in various mouse models, including a humanized mouse liver model, ALD model and CCL4 fibrosis model. This study will be of interest to researchers in liver regeneration and repair mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      The work by Fisher et al describes the role of novel RSPO mimetics in the activation of WNT signaling and hepatocyte regeneration. However, the results of the experiments and weaknesses of the methods used do not support the conclusions of the authors that the new therapy can promote liver regeneration in alcohol-induced liver cirrhosis.

      Strengths:

      Similarly to its precursor, aASGR1-RSPO2-RA-IgG, SZN-043 can upregulate Wnt target genes and promote hepatocyte proliferation in the liver.

      Comments on revisions:

      The authors responded to all my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Fisher et al investigates therpauetic role for SZN-043, a hepatocyte-targeted R-spondin mimetic, for its potential role in restoring Wnt signaling and promoting liver-regeneration in alcohol-associated liver disease (ALD). Using multiple preclinical models, the compound was shown to promote hepatocyte proliferation and reduce fibrosis. This study highlights the efficacy in promoting liver regeneration while maintaining controlled signaling. Limitations include a need for further exploration of off-target effects and fibrosis mechanisms. The findings support SZN-043 as a promising candidate for ALD therapy, warranting further clinical evaluation. This is a well deigned study with thorough investigation using multiple disease models.

      Strengths:

      (1) Well-written manuscript with clear design, robust methods, and discussion.

      (2) Using multiple models strengthens the findings and expands beyond ALD.

      (3) Identification of SZN-043 as a novel potent drug for liver regeneration.

    4. Author response:

      Response to Comments from reviewer #1

      Many thanks for appreciating that SZN-043 can promote hepatocyte proliferation via the Wnt-signaling pathway.

      (1) The reviewer is concerned with using only CYP1A2 expression as an endpoint to make a conclusion about the effect of SZN-043 on Wnt activity in human ALD samples. The reviewer raises a good point as the more commonly used Wnt target gene, AXIN2, is not consistantly changed in both cohorts. We were at first also surprised by this finding. However, upon closer analysis we found that the expression of hepatocyte-specific target genes such as CYP1A2 (Figure 2), CYP2E1, OAT, LGR5, GLUL (Table 1) and ZNRF3 were mostly expressed in hepatocytes and ductal cells were all down-regulated in ALD samples. Others Wnt target genes expressed in epithelial and mesenchymal liver cell populations, such as AXIN2, CCND1 and NOTUM are indeed not consistently and significantly changed. Given that SZN-043 is not active on mesenchymal cells, this discrepancy could be best explained by the large increase in mesenchymal cells in ALD tissue samples, thereby confounding the results. We have now clarified this in the discussion. Another method to assess Wnt activity is to measure b-catenin phosphorylation and nuclear transfer. In our hands, this method was found to be better suited for tissue culture than histological sections from in vivo studies. We have also amended the manuscript title to refer to expression of Wnt target genes, rather than Wnt activity.

      (2) We have now added a supplemental figure to show the lack of Ki-67+ human hepatocytes in the cirrhotic tissue samples to confirm the absence of hepatocyte proliferation (Figure S1).

      (3) The differences in amino acid sequence between SZN-043 and its precursor, αASGR1-RSPO2-RAIgG, can be found in the material and method section. These changes in amino acid sequences improved the biophysical properties of the final clinical candidate, such as oxidation and nonspecific binding. The biochemical analysis of those differences exceeds the scope of the current manuscript. We present here the pharmacokinetic properties of SZN-043 only, as this was the only molecule advanced to clinical trial and used in the studies presented here.

      (4) The reviewer suggests to assess the effect of SZN-043 in Ctnnb1-KO mice to confirm that SZN043 acts via a canonical Wnt pathway. Indeed, there were several reports on the ability of Rspondin to act on other pathways besides the Wnt signaling pathway (for recent review, Niehrs et al, 2024, Bioessays). However, while an interesting suggestion, this line of investigation belongs to MOA studies and exceeds the scope of the current manuscript. An additional manuscript presenting MOA studies for SZN-043 was recently submitted elsewhere. Still, we have added this possibility in the discussion section.

      (5) The reviewer is asking how SZN-043 is affecting liver functions in general. Indeed, we have observed a consistent reduction in the international normalized ratio of prothrombin time using the thioacetamide (TAA)-induced fibrosis model and previously published those findings (Zhang, 2020). In our hands, the TAA is the only liver injury model that significantly increases INR. This increase is modest compared to that observed in clinical patients. Therefore, we do not report INR findings for other models. We have not seen any effects of SZN-043 on hepatocyte differentiation markers such as HNF4A (data not shown) and the hepatocyte specific ASGR1/2 as shown in Figure 5. Rather we focused on proliferation as the main potentially beneficial endpoint, to restore the parenchymal mass in injured livers. Finally, consistent with what was reported in the literature, we have observed a transient and reciprocal effect on albumin and alfa-fetoprotein expression during the proliferative phase of liver regeneration. These results are detailed in an additional manuscript presenting MOA studies for SZN-043, which was recently submitted elsewhere.

      (6) We have used females only in the ethanol-induced injury models because there are numerous reports in the literature stating that males are not as susceptible to those injuries.  

      (7) The reviewer questions the relevance of the ethanol-induced injury model used to evaluate SZN043 efficacy. Indeed, none of the disease model developed to date reproduce the severity and complexity of alcohol-associated liver diseases, although some, such as the ethanol supplemented Lieber DeCarli diet, are more commonly used than others – which is the reason why this model was selected. 

      (8) The reviewer questions the relevance of the fibrosis model used to evaluate SZN-043 efficacy. Indeed, none of the fibrosis models developed to date reproduce the severity and complexity of cirrhosis in human livers. While combining ethanol with CCl4 would lead to more severe fibrotic livers, CCl4 itself is not involved in ALD in humans. Both models are likely to result in similar pericentral fibrosis with central-to-central bridging. In this study, we were mostly interested in addressing the effects of SZN-043 in a tissue affected by fibrotic scars.  

      (9) The sex of CCl4-treated mice is male. We added this information in the methods section.

      (10) A summary of histology and fibrosis assessment data for alcohol-fed mice was added in supplemental Table S3. In our hands, the use of aging mice did not induce the presence of fibrosis, in contrast to published results.  

      (11) The rationale for using 13.5-month-old mice in the alcohol studies and scid mice in the CCl4 studies has been clarified in the results and discussion sections. 

      a. Briefly, aging mice were reported to be more susceptible to ethanol-induced injury than young mice and to include induction of fibrosis. However, we were unable to reproduce the presence of fibrosis reported in the literature.  

      b. Scid mice were used in the CCl4 studies to test whether a stronger response could be observed in the absence of a potential anti-drug antibodies response. While a modest reduction in fibrosis was observed in both B6 and scid mice following the SZN-043 treatment, the effect size did not seem affected by the mouse strain. 

      Response to Comments from reviewer #2

      Many thanks for appreciating that the use of multiple disease models to identify SZN-043 as a potential novel drug for liver regeneration.

      (1) The importance of restoring liver regeneration capacity to reduce the need for liver transplantation had been emphasized in the introduction.

      (2) There is continuous damage to the mouse hepatocytes in the FRG mice, due to the Fah mutation. They undergo repair mechanisms favoring the proliferation of human hepatocytes during the production period. Injury models that affect the human hepatocytes population have been developed in these mice. However, the primary goal of this study was to confirm that SZN043 was efficacious in inducing human hepatocytes proliferation, a feature difficult to reproduce in primary hepatocyte cultures. Given the artefactual nature of the chimeric liver in FRG mice and the high cost of these mice, further studies were not judged to be necessary.

      (3) Corrected

      (4) A figure including DAPI staining has now been included in supplemental Figure S2.

      (5) Clarification that the 8 weeks alcohol feeding used in our study design is a modification of the NIAAA model. While some ASGR1 has been reported on the surface of macrophages, additional data from MOA studies strongly suggest that the effect of SZN-043 is mediated via a hepatocytespecific mechanism (submitted manuscript).

      (6) The reviewer inquired about the potential role of macrophages in promoting an antiinflammatory state in response to SZN-043. While a direct effect is unlikely, a potential effect of macrophages in response to SZN-043 is plausible. Wnt activation is known to induce the secretion of hepatokines, such as LECT2, which in turn can influence macrophage activity. This possibility is discussed in the discussion section.

      (7) The potential off-target effects of SZN-043 such as stellate cell activation is discussed in the discussion section.

      (8) The discussion of the limitations of current models has been included in the discussion section of the manuscript.

      (9) We have now included a discussion of prior RSPO-based therapies, such as OMP-131R10. We explain why the hepatocyte-targeting of RSPO activity minimizes undesired effects.

    1. eLife Assessment

      This study presents a valuable finding that the blood-brain barrier (BBB) may be modulated through specific modes of electroacupuncture stimulation. The data were collected and analyzed using a solid and validated methodology, and can be used as a starting point for functional studies of the BBB for drug delivery across healthy and diseased states. The work will be of broad interest to scientists working in the field of drug delivery and drug development.

    2. Joint Public Review:

      This study employs single-cell RNA sequencing to investigate how electroacupuncture (EA) stimulation alters the transcriptional profiles of central nervous system cell types following blood-brain barrier (BBB) opening. The authors seek to characterize changes in gene expression and pathway activities across diverse neural cells in response to electroacupuncture (EA) stimulation using high-resolution transcriptomics. This approach has the potential to elucidate the cellular mechanisms underlying EA stimulation and their implications for therapeutic intervention. The work engages with a timely and biologically significant question regarding noninvasive stimulation methods to manipulate BBB permeability. However, no in vivo/in vitro functional assays are provided to validate the changes in BBB permeability or cytokine release in the tested models. The experimental rationale remains inadequately explained, and key details regarding the magnitude, duration, and spatial distribution of BBB opening in this system are still lacking.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, and we hace provided a more detailed introduction in the introduction section, in lines 60-71 of the manuscript.

      We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      In addition, to validate the results at the protein level, we have recently conducted some experiments. However, as several proteins are currently at a critical stage of further experimental validation, it is not appropriate to present them in the manuscript at this time. Instead, we have uploaded the relevant data as an appendix for your review. This includes a figure of several protein markers we examined, as well as a table of the antibodies used.

      This section is also further elaborated in the introduction and its references.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      This section is also further elaborated in the introduction and its references.

      Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      Considering the overall data layout and the length of the article, we ultimately decided not to make any changes to the presentation of the article's data. The images included in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view any data they are interested in.

      Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 3-7: Label treatment groups (CON vs. EA) consistently in legends.

      (2) Methods: Specify rat strain (Sprague-Dawley) in the abstract.

      (3) Clarify Limitations: Explicitly state that BBB opening is inferred, not proven.

      This section has been revised at lines 743-733, 748, 949, 754-755, and 759-760 of the manuscript.

      Revised at line 31 of the manuscript.

      Thank you for your feedback. The background information on the open evidence of BBB has been added to the introduction.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract and Introduction

      • Include specific key findings in the abstract to improve clarity and reader engagement.

      • Expand the introduction to situate this work in the context of other BBB-opening methods (e.g., ultrasound) and the known consequences of BBB disruption.

      • Clarify the rationale for choosing electroacupuncture.

      • Include information (perhaps summarized from previous studies) about the extent, timeline, and functional assessment of BBB opening in this model to help justify the single-cell RNA-seq design.

      (2) Experimental Rationale and Context

      • Reiterate experimental design and rationale in each results section, rather than relying exclusively on the Methods section.

      • Specify the time point of tissue collection relative to the EA intervention.

      • Describe the anatomical sites of acupuncture stimulation and their physiological relevance.

      (3) Data Presentation

      • Replace the human brain cartoon in Figure 1 with an anatomically appropriate rat brain schematic.

      • Reevaluate which data are presented in the main versus supplementary figures. Highlight biologically meaningful results, such as cell-cell communication and ligand-receptor interactions, in the main figures rather than supplementary data.

      (4) Interpretation and Modeling

      • More carefully link transcriptional changes (e.g., Wnt signaling in microglia) to biologically plausible mechanisms of BBB regulation-e.g., microglial signaling to endothelial cells.

      • Clarify whether the presence of granulocytes and T cells might result from a lack of perfusion prior to brain dissection.

      • Consider proposing a model (even speculative) of how EA leads to BBB opening based on observed transcriptional changes.

      First, for the sake of brevity in the abstract, we did not present specific results in this section. Second, since BBB opening via EA is a unique strategy, our previous studies have examined the opening time window and the recovery of the BBB after EA intervention (as mentioned in the introduction). We believe its characteristics differ from those of ultrasound-induced BBB opening and BBB disruption, so we did not conduct comparative discussions, but objectively presented our research findings. In further functional validation experiments, we may consider integrating other opening strategies in our studies. Additionally, the choice of electroacupuncture was based on our previous series of studies, which have already been outlined in the research background. Finally, we did indeed determine the experimental design of this study based on prior research, as described in the background section of the introduction.

      We decided not to make changes to this section in the manuscript after careful consideration. The setup of electroacupuncture intervention and controls has been thoroughly discussed in our previous studies (as referenced in the introduction), so we have not repeated it in this manuscript. Overall, building on all our previous findings, this study focuses primarily on the potential mechanisms of EA intervention. The anatomical sites of acupuncture stimulation and their physiological relevance are another key area of our research, and we are currently conducting a series of related studies. We look forward to sharing these findings with you in the future.

      We have already changed the human brain diagram in Figure 1 to a rat brain diagram, and have replaced Figure 1 in the files with the revised version. However, considering the overall data layout and the length of the article, we ultimately decided not to make changes to the data presentation in the manuscript. The images in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view the data they are interested in.

      This section has provided us with excellent suggestions for further exploration, although no changes have been made to the manuscript at this time. In the future, we may conduct more detailed transcriptomic studies focusing on sex differences and different brain regions, which will allow for a more comprehensive analysis of the biological mechanisms involved in BBB regulation.

    1. eLife Assessment

      This valuable study explores the role of the chromatin regulator ATAD2 in mouse spermatogenesis. It convincingly demonstrates that ATAD2 is essential for proper chromatin remodeling in haploid spermatids, influencing gene accessibility, H3.3-mediated transcription, and histone eviction. Using Atad2 knockout (KO) mice, the authors link ATAD2 to the DNA-replication-independent incorporation of sperm-specific proteins like protamines and histone H3.3. Although the findings highlight chromatin abnormalities and impaired in vitro fertilization in KO mice, natural fertility remains unaffected, suggesting possible in vivo compensatory mechanisms. However, in its current form, the study lacks mechanistic insight and provides only partial evidence for ATAD2's molecular role, limiting its functional conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

      Strengths:

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.

      Weaknesses:

      (1) Some results lack quantification.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-to-protamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.

      Strengths:

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.

      Weaknesses: The MS is robust and there are not big weaknesses

    4. Reviewer #3 (Public review):

      Summary:

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.

      Strengths:

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype?

      (5) The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.

      Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).

      In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.

      Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).

      Strengths: 

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis. 

      Weaknesses: 

      (1) Some results lack quantification. 

      We will consider all the data and add appropriate quantifications where necessary.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      Please see our comments above.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility. 

      Strengths: 

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo. 

      Weaknesses:

      The MS is robust and there are not big weaknesses 

      Reviewer #3 (Public review): 

      Summary: 

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally. 

      Strengths: 

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function. 

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis. 

      We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.

      This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.

      We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.  

      (1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).

      (2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.

      (3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.

      (4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.  

      Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim. 

      Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype? 

      These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.

      (5)The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation. 

      The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis

      Reference

      Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.

      Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.

      Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.

      Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi:10.1016/s0092-8674(03)01064-x. PMID: 14718166.

    1. eLife Assessment

      This useful work identifies new monoclonal antibodies produced by cystic fibrosis patients against Pseudomonas aeruginosa type three secretion system. The evidence supporting authors' claim is solid. Nonetheless, the manuscript may benefit from a more in depth description of what the authors learned from their structure-based analyses of antibodies targeting PcrV.

    2. Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      The antibiotic resistance crisis requires the development of new solutions to treat infections cause by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuate data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

    3. Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). Authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6-PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      • Article is well written.

      • Authors used complementary assays to evaluate protective effect of candidate monoclonal antibodies.

      • Authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      • Authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      • Results shown in Fig. 6 should be initially described in the Results section and not in the Discussion section.

      • The authors should describe, in the Discussion (and also in L146-147), in more detail the gained insights into how anti-PcrV antibodies work. This is especially important given previous reports of more potent antibodies (e.g., Simonis et al.) that significantly reduces the novelty of their work. Hence, authors could explicitly highlight how their study differentiate from previous work, and what unique insights were gained (in the current version is not completely obvious).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      Strengths:

      The antibiotic resistance crisis requires the development of new solutions to treat infections caused by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report, the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuable data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for the characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

      Weaknesses:

      Although the information presented in this manuscript is important, previous reports regarding other T3SS structures complexed with antibodies, reduce the novelty of this report. Nevertheless, we provide several comments that may help to improve the report. The structural analysis of the presented mAbs is incomplete and unfortunately, the authors did not address any developability assessment. With such vital information missing, it is unclear if the proposed antibodies are suited for diagnostic or therapeutic usage. This vastly reduces the importance of the possibly great potential of the authors' findings. Moreover, the structural information does not include the interacting regions on the mAb which may impede the optimization of the mAb if it is required to improve its affinity.

      As described in the manuscript (Fig. 6), our mAbs are markedly less effective in every in vitro T3SS inhibition assay than the mAbs recently described by Simonis et al. They are therefore very unlikely to outperform these mAbs in in vivo animal models of P. aeruginosa infection. Considering the high cost of animal experiments and ethical concerns-and in accordance with the Reduction principal of the 3Rs guidelines-we chose not to pursue in vivo experiments. Instead, we focused on leveraging the new isolated mAbs to investigate the mechanisms of action and structural features of anti-PcrV mAbs.

      Following the reviewer's suggestion, we have now added mAb interaction features into the structural data presented in the manuscript. However, based on the efficiency data, the structural analysis and the mechanistic insights presented, we do not consider further therapeutic use and optimization of our mAbs to be warranted.

      Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). The authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      (1) The article is well written.

      (2) The authors used complementary assays to evaluate the protective effect of candidate monoclonal antibodies.

      (3) The authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      (4) The authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      The authors used a similar workflow to the one previously reported in Simonis et al. 2023 (antibodies from cystic fibrosis patients that included B cell isolation, antibody-PcrV interaction modeling, etc.) but the authors do not clearly explain how their work and findings differentiate from previous work.   

      We employed a similar mAb isolation pipeline to that used by Simonis et al., beginning with the screening of a cohort of cystic fibrosis patients chronically infected with P. aeruginosa. As in Simonis et al., we isolated specific B cells using a recombinant PcrV bait, followed by single-cell PCR amplification of immunoglobulin genes. The main differences in methodology between the two studies are as follows: i) the use of individuals from different cohorts, and therefore having different Ab repertoires; ii) the nature of the screening assays, although in both cases the screening was focused on the inhibition of T3SS function; iii) the PcrV labeling strategy, with Simonis et al. employing direct labeling, whereas we used a biotinylated tag combined with streptavidin;

      The number of specific mAbs obtained and produced was higher in Simonis et al. (47 versus 9 in our study). They sorted B cells from three individuals compared to two in our work and possibly started with a larger amount of PBMCs per donor, which may account for the higher number of specific B cells and mAbs isolated. Considering that the strategies were overall very similar, the greater number of mAbs isolated in Simonis et al. likely explains, to a large extent, why they identified mAbs targeting different epitopes compared to ours, including highly potent mAbs that we did not recover. 

      Our modeling study, unlike that of Simonis et al., which relied on an AlphaFold prediction of the multimeric structure of P. aeruginosa PcrV, was based on the experimentally determined structure of the homologous Salmonella SipD pentamer, as described in the manuscript. Furthermore, we compared our mAb P3D6 not only with 30-B8 from Simonis et al., but also with MEDI3902. Finally, in contrast to the approach of Simonis et al., we used functional assays to investigate the differences in mechanisms of action among these mAbs, which target three distinct epitopes.

      (2) Although new antibodies against P. aeruginosa T3SS expand the potential space of antibodybased therapies, it is unclear if P3D6 or P5B3 are better than previous antibodies. In fact, in the discussion section authors suggested that the 30-B8 antibody seems to be the most effective of the tested antibodies.  

      As explained above and shown in the Results section (Figure 6), the 30-B8 mAb is markedly more effective at inhibiting T3SS activity in both in vitro assays used.

      (3) The authors should explain better which of the two antibodies they have discovered would be better suited for follow-up studies. It is confusing that the authors focused the last sections of the manuscript on P3D6 despite P3D6 having a much lower ExoS-Bla inhibition effect than P5B3 and the limitation in the PcrV variant that P3D6 seems to recognize. A better description of this comparison and the criteria to select among candidate antibodies would help readers identify the main messages of the paper. 

      The P3D6 mAb shows stronger inhibitory activity than P5B3 in the two assays used, as shown in Supplementary Figure 1. An error in the table in Figure 2B was corrected and this table now reflects the results presented in Supplementary Figure 1. 

      The final sections of the manuscript focus on P3D6, which is more potent than P5B3, and for which we successfully determined a co-crystal structure with PcrV*. All parallel attempts to obtain a structure of P5B3 in complex with PcrV* failed. The P3D6-PcrV* structure was used to analyze epitope recognition and mechanisms of action in comparison to previously described mAbs. As previously mentioned, we do not consider further studies aimed at therapeutic development and optimization of our mAbs to be justified given the current data. Therefore, we believe that the main message of the paper is adequately captured in the title.

      (4) This work could strongly benefit from two additional experiments:

      (a) In vivo experiments: experiments in animal models could offer a more comprehensive picture of the potential of the identified monoclonal antibodies. Additionally, this could help to answer a naïve question: why do the patients that have the antibodies still have chronic P. aeruginosa infections? 

      As explained above, the mAbs we isolated are significantly less potent than those described by Simonis et al., and are therefore unlikely to outperform the best anti-PcrV candidates in vivo. In light of the data, and considering ethical concerns related to animal use in research and budgetary constraints, we decided not to proceed with in vivo experiments.

      There are a number of reasons that may explain why patients with anti-PcrV Abs blocking the T3SS can still be chronically infected with Pa. First these Abs may be at limiting concentration, particularly in sites where Pa replicates, and thus unable to clear infection. in addition, it has been described that the T3SS is downregulated in chronic infection in cystic fibrosis patients. This suggests that a therapeutic intervention with T3SS inhibiting Abs may be more efficient if done early in cystic fibrosis patients to prevent colonization when Pa possesses an active T3SS. Finally, T3SS is not the only virulence mechanism employed by P. aeruginosa during infection. Indeed, multiple protein adhesins and polysaccharides are important factors facilitating the formation of bacterial biofilms that are crucial for establishing chronic persistent infection. In this regard, a combination of Abs targeting different factors on the P. aeruginosa surface may be needed to treat chronic infections.  

      (b) Multi-antibody T3SS assays (i.e., a combination of two or more monoclonal antibodies evaluated with the same assays used for characterization of single ones). This could explore the synergistic effects of combinatorial therapies that could address some of the limitations of individual antibodies. 

      Given the high potency of the Simonis mAbs and the mechanisms of action highlighted by our analysis, it is unlikely that our mAbs would synergize with those described by Simonis. Additionally, since our two mAbs cross-compete for binding, synergy between them is also improbable.

      Reviewer #1 (Recommendations for the authors):

      Line 166: How was the serum-IgG purified? (e.g., protein A, protein G). 

      Protein A purification was used, as now mentioned in the manuscript. Purified Igs were thus predominantly IgG1, IgG2 and IgG4, as indicated.

      (2) Line 196: When mentioning affinities, it is preferable to present in molar units. 

      To facilitate comparisons, Ab concentrations were presented in µg/mL as in Simonis et al.

      (3) Line 206: The author states that P3D6 displays significantly reduced ExoS-Bla injection (Figure 2B), but according to the presented table, ExoS-Bla inhibition was higher for P5B3. Additionally, when using "significantly", what was the statistical test that was used to evaluate the significance? Please clarify.

      We thank the reviewer for pointing out this inconsistency. Indeed, the names of P3D6 and P5B3 were exchanged when building the table related to Figure 2B. The corrected version of this figure is now presented in the new version of the manuscript. An ANOVA was performed to evaluate the significance of the observed difference (adjusted p-values < 0.001) and it is now mentioned in the figure caption.  

      (4) Line 215: "P3B3" typo.

      This was corrected.

      (5) Figure 3B: Could the author explain the higher level of ExoS-Bla injection when using VRCO1 antibody compared to no antibody.  

      A slightly higher level of the median is observed in the case of three variants out of five. However, this difference is not statistically significant (p-value > 0.05).

      (6) Supplement Figure 1: the presented grey area is not clear (is it the 95%CI?) and how was the IC50 calculated? With what model was it projected? Are the values for IC50 beyond the 100µg/mL mark a projection? It seems that projecting such greater values (such as the IC50 of over 400µg/mL for variant 5) is prone to high error probability.

      The grey area represents the 95% confidence interval (95% CI) and it is now mentioned in the figure caption. The IC50 and 95% CI were both inferred by the dose-response drc R package based on a three-parameters log-logistic model and it is now explained in the Materials & Methods section. The p-values for IC50 beyond the 100µg/mL were below 0.05 but we agree that such extrapolation should be considered with precaution (see below our response to comment number 7).

      (7) Line 227: The author describes that P5B3 has similar IC50 values towards variants 1-4, but the  IC50 towards variant 5 is substantially higher with 400µg/mL, albeit the only difference between variant 4 and 5 is the switch position 225 Arg -> Lys which are very similar in their properties. Please provide an explanation. 

      As explained in our response to comment number 6, we agree that the comparison of IC50 that are estimated to be close or higher than the highest experimental concentration is somehow speculative. Indeed, we performed further statistical analysis that showed no significant difference between the IC50 toward the five PcrV variants of mAb P5B3. In contrast, the difference between the IC50 of mAbs P5B3 and P3D6 toward variant 1 is statistically significant. This is now explained in the manuscript.

      (8) Line 233: Pore assembly: It is not clear how the data was normalized. The authors mention the methods normalization against the wildtype strain in the absence of antibodies, but did not elaborate clearly if the mutant strain has the same base cytotoxicity as the wild type. It would be helpful to show the level of cytotoxicity of the wild type compared to the mutant in the absence of antibodies to understand the baseline of cytotoxicity of both strains.  

      In these experiments we did not use the wild-type strain. As explained, the only strain that allows the measurement of pore formation by translocators PopB/PopD is the one lacking all effectors. All the experiments were done with this strain, and all the measurements were normalized accordingly. 

      (9) Figure 4: The explanation is redundant as it is clearly stated in the results. It would be better for the caption to describe the figure and leave interpretation to the results section. Overall, this comment is relevant to all figure captions, as it will reduce redundancy. My suggestion is to keep the figure caption as a road map to understand what is shown in the figure. For example, the Figure 4 caption should include that the concentration is presented in logarithmic scale, what is the dashed line, what is the grey area (what interval does it represent?), what each circle represents, and what is the regression model used? 

      Figure captions have been improved as suggested. 

      (10) Line 432: The authors apparently misquoted the original article describing the chimeric form PcrV* by describing the fusion of amino acids 1-17 and 136-249. I quote the original article by Tabor et al. "[...] we generated a truncated PcrV fragment (PcrVfrag) comprising PcrV amino acids 1-17 fused to amino acids 149-236 [...]". Additionally, how does the absence of amino acid 21 in the variant affect the conclusion? 

      Our construct was inspired by the one described in Tabor et al. but was not identical. We have therefore replaced "was constructed based on a construct by Tabor et al." for "whose design was inspired by the construct described in Tabor et al."

      Amino acid 21 is only absent in the construct used for crystallization experiments; all other experiments looking at Ab activity were performed with bacteria bearing full-length PcrV. The difference in P3D6 activity between variants V1 and V2-appears to be explained by the nature of the residue at position 225, according to the structural data, as explained now in more detail in the manuscript. Accordingly, the difference in efficiency of P3D6 against the V1 and V2  variants is explained by the residue at position 225, as both variants have the same residue at position 21. However, while the nature of the residue at position 225 appears to explain the absence of efficiency of the Ab for the variants studied, an impact of residue 21 could not be totally ruled out in putative variants with a Ser at 225 but different amino acids at 21.

      (11) Line 569: Missing word - ESRF stands for European Synchrotron Radiation Facility. 

      This has been corrected.

      (12) Line 268-269 (Figure 5A): The description of the alpha helices in relation to the figure is incomplete. Helices 2,3 and 5 are not indicated. 

      Indeed, since the structure is well-known and in the interest of visibility and simplicity, we only included the most relevant secondary structure features.

      (13) Line 271-272: It would be good to elaborate on the exact binding platform between LC and HC of the Fab and the residues on the PcrV side. For example, the author could apply the structure to PDBePISA (EMBL-EBI) which will provide details about the interface between the PcrV and the antibody. It is very interesting to learn what regions of the antibody are in charge of the binding, such as: is the H-CDR3 the major contributor of the binding or are other CDRs more involved? Additionally, in line 275 they state that the substitution of Ser 225 with Arg or Lys is consistent with the P3D6 insufficient binding. What contributed to this result on the antibodies side? 

      In order to address this question, we are now providing a LigPlot figure (supplementary Figure 3) in which specific interactions between PcrV* and the Fab are shown.

      (14) Line 291: It is unclear from what data the authors concluded that anti-PscF targets 3 distinct regions of PscF. 

      The data are shown in Supplementary Table 2, as mentioned in the manuscript. We have now modified the order of the anti-PcrV mAbs in the table to better illustrate the three identified epitope clusters (Sup table 2). Similarly, the anti-PscF mAbs appear to group into three clusters as P3G9 and P5E10 only compete with themselves, while mabs P3D6 and P5B3 compete with themselves and each other.

      (15) Line 315: It is preferable to introduce results in the results section instead of the discussion. 

      While preparing the manuscript, we initially included these results as a separate paragraph in the Results section, but ultimately chose the current format to improve flow and avoid redundancy.

      (16) Supplement Figure 2: What was the regression model used to evaluate IC50, and what is presented in the graph? What is the dashed line (see comment for Figure 4 above)? 

      The regression is based on a three-parameters log-logistic model and the light-colors area correspond to the 95% IC. The dashed lines visually represents 100% of ExoS-Bla injection. These information are now mentioned in the figure caption.

      (17) Figure 6B: It would be better to show an additional rotation of the PcrV bound by Fab 30-B8 that corresponds to the same as the one represented with Fab MEDI3092. This would clear up the differences in binding regions. Same for Fab P3D6. 

      Figure 6 already depicts two orientations. Despite the fact that we agree that additional orientations could be of interest, we believe that this would add unnecessary complexity to the figure, and would prefer to maintain the figure as is, if possible.

      (18) Line 356-358: The author proposes an experiment to support the suggested mechanism of P3D6, it would follow up with a bio-chemical analysis showing the prevention of PcrV oligomerization in its presence. 

      We understand the reviewers’ comment regarding the potential use of biochemical approaches to test our hypothesis. However, this not currently feasible as we have been unable to achieve in vitro oligomerization of PcrV alone, possibly due to the absence of other T3SS components, such as the polymerized PscF needle.

      (19) Line 456: Missing details about how the ELISA was conducted including temperature, how the antigen was absorbed, plate type, etc. 

      Experimental details have been added.

      (20) Line 460: Missing substrate used for alkaline phosphatase. 

      The nature of the substrate was added to the methods.

    1. eLife Assessment

      The important paper presents a new behavioral assay for Drosophila aggression and demonstrates that social experience influences fighting strategies, with group-housed males favoring high-intensity but low-frequency tussling over aggressive lunging observed in isolated males. The experiments are solid, and the conclusions are important for researchers studying the impact of social isolation on aggression.

    2. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays. This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Weakness:

      All prior concerns have been addressed in the revised manuscript. The added 'Limitations of the study' section is a welcome and important clarification. Despite these limitations, the study provides valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

    3. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its possible biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study discusses an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in the neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflicting between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that, in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors suggested that an altered fighting strategy has effects with respect to these behaviors.

      Weaknesses:

      New experimental paradigm in Fig. 6 is quite useful, but as the authors mentioned, still the future investigations are needed to reveal a direct relationship between aggression strategies and reproductive success.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathologies. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperforms theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings, and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This also pertains to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output, and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations nor were there results of any regression analyses. That said, the authors did preregister the CCA analysis, so while perhaps not the best method, it was justified to complete it. Regardless of method, the psychopathology results are not particularly convincing, but provide an interesting jumping-off point for further exploration in future work.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way, and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a *very good idea*.

      The more concrete contributions, however, are not as strong. In particular, evidence for the paper's most striking claims is weak. Quoting the abstract, these claims are (1) "the elasticity of control [is] a distinct cognitive construct guiding adaptive behavior" and (2) "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control."

      Main issues

      I'll highlight the key points.

      - The task cannot distinguish elasticity inference from general learning processes

      - Participants were explicitly instructed about elasticity, with labeled examples

      - The psychopathology claims rely on an invalid interpretation of CCA, and are contradicted by simple correlations (elasticity bias and the sense of agency scale is r=0.03)

      Distinct construct

      Starting with claim 1, there are three subclaims here. (1A) People's behavior is sensitive to differences in elasticity; (1B) there are mental processes specific to elasticity inference, i.e., not falling out of general learning mechanisms; and, implicitly, (1C) people infer elasticity naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not well supported.

      (1B) The data cannot support the "distinct cognitive construct" claim because the task is too simple to dissociate elasticity inference from more general learning processes (also raised by Reviewer 1). The key behavioral signature for elasticity inference (vs. generic controllability inference) is the transfer across ticket numbers, illustrated in Fig 4. However, this pattern is also predicted by a standard Bayesian learner equipped with an intuitive causal model of the task. Each ticket gives you another chance to board and the agent infers the probability that each attempt succeeds. Crucially, this logic is not at all specific to elasticity or even control. An identical model could be applied to inferring the bias of a coin from observations of whether any of N tosses were heads-a task that is formally identical to this one (at least, the intuitive model of the task; see first minor comment).

      Importantly, this point cannot be addressed by showing that the author's model fits data better than this or any other specific Bayesian model. It is not a question of whether one particular updating rule explains data better than another. Rather, it is a question of whether the task can distinguish between biases in *elasticity* inference versus biases in probabilistic inference more generally. The present task cannot make this distinction because it does not make separate measurements of the two types of inference. To provide compelling evidence that elasticity inference is a "distinct cognitive construct", one would need to show that there are reliable individual differences in elasticity inference that generalize across contexts but do not generalize to computationally similar types of probabilistic inference (e.g. the coin flipping example).

      (1C) The implicit claim that people infer elasticity outside of the experimental task is undermined by the experimental design. The authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips."

      In the revisions, the authors seem to go back and forth on whether they are claiming that people infer elasticity without instruction (I won't quote it here). I'll just note that the examples they provide in the most recent rebuttal are all cases in which one never receives explicit labels about elasticity. If people only infer elasticity when it is explicitly labeled, I struggle to see its relevance for understanding human cognition and behavior.

      Psychopathology

      Finally, I turn to claim 2, that "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control." The CCA analysis is in principle unable to support this claim. As the authors correctly note in their latest rebuttal, the CCA does show that "there is a relationship between psychopathology traits and task parameters". The lesion analysis further shows that "elasticity bias specifically contributes to this relationship" (and similarly for the Sense of Agency scale). Crucially, however, this does *not* imply that there is a relationship between those two variables. The most direct test of that relationship is the simple correlation, which the authors report only in a supplemental figure: there is no relationship (r=0.03). Although it is of course possible that there is a relationship that is obscured by confounding variables, the paper provides no evidence-statistical or otherwise-that such a relationship exists.

      Minor comments

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, the researcher could infer "biases" in elasticity inference that are probably better characterized as effective use of prior information (encoded in the causal model).

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be over-extended given the nature of the data (solely behavioral) and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. eLife Assessment

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

    2. Public Review:

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

      The combination of in vivo cellular calcium imaging, pharmacology, receptor knockout and movement analysis is effectively used. The main findings do not involve gross firing rates or numbers of active neurons, but rather are revealed by specialized measures involving Jaccard coefficient and an assessment of coactivity. The authors conclude that mGlu5 expressed in dSPNs contributes to movement through effects on clustered spatial coactivity of dSPNs. More specifically, reduced mGluR5 increases coactivity during rest (defined as low velocity periods) but not during locomotion periods. The authors observe a role for mGlu5 expression in dSPNs in modulating the frequency of mEPSCs, suggesting a role in presynaptic neurotransmitter release. Some data suggesting the story may be different in the other major SPN subpopulation (iSPNs) are also presented but these studies are relatively underdeveloped leaving some ambiguity as to how cell-selective the findings are. In addition, an occlusion experiment in which the pharmacological mGluR5 agents are delivered to the dSPN mGluR5 KO to clarify if other sites of action are involved beyond the proposed D1-expressing neurons is missing. Finally, the authors present a working model that sets the stage for future experimentation. Overall, this study provides an important and detailed assessment of mGluR5 contributions to striatal circuit function and behavior.

      Remaining concerns include:

      (1) To clarify that dSPNs are sole site of action, it is necessary to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If the effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice.

      (2) There is a concern that the D1 Cre line used (Ey262), which may also target cortical neurons expands the interpretation of the study beyond the striatal populations. Further discussion of this point, particularly in the interpretation of the mGluR5 cKO experiments, would provide a better understanding of the contribution of the paper.

      (3) The use of CsF-based whole-cell internal solutions has caused concern in some past studies due to possible interference with G-protein, phosphatase and channel function (https://www.sciencedirect.com/science/article/abs/pii/S1044743104000296, https://www.jneurosci.org/content/jneuro/6/10/2915.full.pdf). It is reassuring the DHPG-induced LTD was still observable with this solution. However, it might be worth examining this plasticity with a different internal to ensure that the magnitude of the agonist effect is not altered by this manipulation.

      (4) Behavioral resolution of actions at low velocity that are termed "rest" are not explored in this study. Thus, a remaining ambiguity is whether the activities in rest include only periods of immobility or other low-velocity activities such as grooming or rearing.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Can the authors offer a hypothesis as to how decreased coactivity promotes increased movement velocity.” 

      In our revision we have added an additional metric measuring how spatial coactivity changes during movement onset, the spatial correlation index, which replicates a previous finding that co-activity among proximal neurons is statistically greater surrounding movement onset. We did not find, as outlined in the revision, that mGluR5 manipulations significantly altered this relationship. Our data therefore shows, consistent with that shown previously, that ensembles of dSPNs that are co-active during movement onset, in particular ambulatory movement, are more likely to contain neurons that are closer together and the neurons are highly active. In contrast, rest ensembles contain neurons that are less active but have more highly correlated activity, across all pairwise distances. Additionally, mGluR5 inhibition, genetic or pharmacological, promotes the activation of rest ensembles but does not affect the properties of movement ensembles. Previous studies (e.g. Klaus A. et al., 2017) have shown that neurons in rest ensembles are, in general, unlikely to also be members of movement ensembles, We therefore hypothesize that corticostriatal synapses onto SPNs of rest ensembles are more likely, during spontaneous behavior, to have reduced synaptic weight due to mGluR5 signaling, potentially due to eCB mediated inhibition of neurotransmitter release. Therefore, when we inhibit mGluR5 at these synapses, we increase synaptic weight and increase the probability of activation of this coordinated rest ensemble, which suppresses movement. If, on the other hand, the synapses that govern activation of neurons in movement ensembles have a higher weight, they may be unaffected by mGluR5 inhibition. 

      The use of the Jaccard similarity index in this study is not intuitive and not fully explained by the methods or the diagram in Figure 1. 

      We have added more detail to the paper to explain the methodology of the jaccard similarity measure. The advantage of this method is that is specifically captures cells that are jointly active, as opposed to jointly inactive and is therefore useful for capturing co-activity in our sparsely active Ca<sup>2+</sup> imaging data. 

      The analysis of a possible 2-AG role in the mGlu5 mediated processes is incomplete. 

      We agree that, as an experiment to outline which endocannabinoids are involved in modulating synaptic strength through mGluR5, this experiment alone is not sufficient.

      However, our main focus in this paper is how manipulations of mGluR5 affect the spatiotemporal dynamics of dSPNs and we chose not to focus on specific mechanisms of endocannabinoid signaling, though these would certainly be interesting to investigate further in vivo.

      It would seem to be a simple experiment to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice. 

      We agree that this experiment would be valuable and extend our findings presented in the paper, however, it has practically been outside the scope of the current work. 

      Reviewer #2 (Public review):

      Pharmacological and genetic manipulations of mGluR5 do not differentially/preferentially modulate the activity of proximal vs distal dSPNs, therefore, it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity as opposed to differential proximal/distal spatial relationships. 

      As in the response to reviewer 1 above, we have added additional clarification to the text explaining that our manipulations do not differentially affect the co-activity of proximal vs distal dSPNs, this is also quantified throughout the text using the spatial coordination index. However, we disagree that “it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity” as we do not observe statistically significant changes in the event rate following either pharmacological or genetic manipulations of mGluR5. Rather, we consistently observe statistically significant changes in co-activity among neurons, the extent to which activity of active neurons during either rest or movement are correlated with each other. This is the central finding of our manuscript, inhibiting or potentiating mGluR5 signaling alters behavior, not by blanket suppression or enhancement of the activity as measured using the event rate, of dSPNs, but by affecting their ensemble dynamic properties.  Co-activity during rest versus ambulatory movement is statistically greater in both proximal and distal cells and inhibiting mGluR5 increases this co-activity and decreases movement. 

      For these analyses of prox vs distal and all others, please include the detail of how many proximal vs distal cells were involved and per subject. 

      We have added a supplemental table that details the number of cells included per subject in all analyses

      Ln. 151-152: Please provide data concerning how volumes of infectivity differ between injecting AAV vs. coating the lens? If these numbers are very different, this could impact the number of Jaccard pairings and bias results. 

      While viral injection may lead to a larger volume of expression, with this one photon imaging method only those cells within ~200 microns of the edge of the lens will be able to be resolved, therefore practically, if there is an additional volume of infected tissue outside of the field of view of the lens, it would not affect the results as these neurons will not be resolved by the endoscope camera. Accordingly, the average number of cells detected per session is very similar following each approach (mean # of cells per session with coating 90.93 ± 23.69 cells, with viral injection 90.03 ± 29.29 cells)

      Is mGluR5 affecting dSPN activity in other measures beyond co-activity and rate? Does the amplitude of events change?

      We have added supplemental data for figures 2, 3, and 5 demonstrating that manipulations of mGluR5 do not affect the amplitude or length of Ca<sup>2+</sup> events included in the analysis. 

      What is the model of mGluR5 signaling in a resting state vs. movement? What other behaviors are occurring when the mouse is in a low velocity "resting state" (0-0.5 cm/s). If this includes other forms of movement (i.e. rearing, grooming) then the animal really isn't in a resting state. This is not mentioned in the open field behavior section of the methods and should be described (Ln. 486) in addition to greater explanation of what behavior measures were obtained from the video tracking software (only locomotion?)

      It would be very interesting to determine if during “rest,” when the animals is not engaged in ambulatory behavior, it may be engaged in some fine motor behavior. However, the resolution of the cameras used to measure locomotor activity in this dataset does not allow us to do this. 

      There is large variability in co-activity in proximal dSPNs when animals are "resting" (2j). Could this be explained by different behavior states within your definition of "rest"?

      We agree that if the animal is engaging in fine motor behavior that we cannot resolve with our behavior setup, this could produce some variability in coactivity. However, as shown previously (e.g. Klaus A. et al., 2017), ensembles active when the animal is not moving (our definition of “resting”), regardless of additional fine motor behaviors the animal may be engaged in when not moving, are substantially different that those ensembles that are active when the animal is moving. We therefore expect that this may limit, although potentially not eliminate, variability due to different behavioral states we may have grouped into our “resting” category. Unfortunately, as mentioned above, we are not able resolve variations in fine motor output in this behavioral data. 

      Have you performed IHC, ISH or another measure to validate D1 cell specific cKO?

      The mGluR5<sup>loxP/loxP</sup> mice used in this study were characterized previously by our lab (Xu et al., 2009), we used the same mice here with a different, but also published and characterized Cre-driver line, Drd1a-Cre Ey262 (Gerfen et al., 2013).

      Why are the "Mean Norm Co-activity" values in 5e so high in this experiment relative to figures 2-4?  

      In experiments where we treated the same animal with vehicle and a drug (i.e., experiments in Figure 2 and 3), we normalized the values for each animal in the drug treatment group to the distal bin of that animal following vehicle treatment. This allowed us to more clearly resolve the changes within each animal due to drug treatment. As comparisons in the data in figure 5 d–f are between different animals (rather than different treatments of the same animal) we could not perform this normalization procedure.  

      Reviewer #3 (Public review):

      Some D1 Cre lines have expression in the cortex. Which specific Cre line was used in this study? 

      We used, Drd1a-Cre Ey262. This is included in methods. 

      The text says JNJ treatment .... increased locomotor speed (Figure 3b) and increased the duration but not frequency of movement bouts (Figure 3c, d). However, the statistics of the figure legends say: however the change in mean velocity (3b) is not significant (p=0.060, U=3, Mann-Whitney U test), nor is the mean bout length during vehicle and JNJ (p=0.060, U=3, Mann-Whitney U test) (3d) Comparison of mean number of bouts of each animal during vehicle and JNJ (p=0.403, U=8, Mann-Whitney U test). 

      This has been corrected to indicate only the change in time spend at rest is statistically significant.

      This effect was most pronounced during periods of rest (Figure 3i, j). The decrease was only in rest? Are the colors in Figure 3J inverted? Therefore, JNJ treatment had effects that were qualitatively the inverse to the effects of fenobam on locomotion and dSPN activity. 

      We have corrected the text to state that, overall, and during periods of rest but not movement, JNJ had effects that were qualitatively the opposite of fenobam.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

    3. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

    4. Author response:

      The following is the authors’ response to the previous reviews

      We have thoroughly addressed all the reviewers’ comments and meticulously revised the manuscript. Key modifications include the following:

      (a) Organizing the Logic and Highlighting Key Findings: We have revised the manuscript to emphasize key findings (especially the distinctions between the SEC and WOI groups) according to the following logic: constructing a receptive endometrial organoid, comparing its molecular characteristics with those of the receptive endometrium, highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the function involved in embryo interaction.

      (b) Clarity and Better Description of Bioinformatic Analyses: We have revised the sections involving bioinformatic analyses to provide a more streamlined and comprehensible explanation. Instead of overwhelming the reader with excessive details, we focused on the most important findings, and performed additional experimental validation.

      (c) Rationale for Gene Selection: We have clarified the rationale for selecting certain genes and pathways for inclusion in the analysis and manuscript. The associated gene expression data for all figures have been provided in the attached Dataset.

      (d) In the response letter, we have provided the detailed presentation of the methodological optimization for constructing this endometrial assembloids, along with optimization and comparison of endometrial organoid culture media. Furthermore, in the Limitations section, we have explicitly stated that stromal cells and immune cells gradually diminish with increasing passage numbers. Therefore, this study primarily utilized endometrial assembloids within the first three passages for all investigations.

      Below, we provide a point-by-point response to each comment, with all modifications highlighted in the revised manuscript. We respectfully hope that these revisions effectively address the concerns raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. The authors did their best to revise their study according to the reviewers' comments. However, the study remains unconvincing and at the same time too dense and not focused enough.

      (1) The use of the term organoids is still confusing and should be avoided. Organoids are epithelial tissue-resembling structures. Hence, the multiplecell aggregates developed here are rather "coculture models" (or "assembloids"). It is still unexpected (unlikely) that these structures containing epithelial, stromal and immune cells can be robustly passaged in the epithelial growth conditions used. All other research groups developing real organoids from endometrium have shown that only the epithelial compartment remains in culture at passaging (while the stromal compartment is lost). If authors keep to their idea, they should perform scRNA-seq on both early and late (passage 6-10) "organoids". And they should provide details of culturing/passaging/plating etc that are different with other groups and might explain why they keep stromal and immune cells in their culture for such a long time. In other words, they should then in detail compare their method to the standard method of all other researchers in the field, and show the differences in survival and growth of the stromal and immune cells.

      (1) We appreciate your feedback and have revised the term 'organoids' to 'assembloids'. 2)

      I. Due to budget constraints, this study did not perform scRNA-seq on both early and late passages (P6-P10). Instead, immunofluorescence staining confirmed the persistence of stromal cells at passage 6 (as shown below).

      Author response image 1.

      Whole-mount immunofluorescence showed that Vimentin+ F-actin+ cells (stromal cells) were arranged around the glandular spheres that were only F-actin+(passage 6).

      II. Improvements in this study include the following.

      a. Optimization of endometrial tissue processing: The procedures for tissue collection, pretreatment, digestion, and culture were refined to maximize the retention of endometrial epithelial cells, stromal cells, and immune cells (detailed optimizations are provided in Response Table 1).

      b. Enhanced culture medium formulation: Based on previous protocols, WNT3A was added to promote organoid development and differentiation (PMID: 27315476), while FGF2 was supplemented to improve stromal cell survival (PMID: 35224622) (see Response Table 2 for medium comparisons). Representative culture outcomes are shown in the figure below.

      We acknowledge that the stromal and immune cells in this system still exhibit differences compared to their in vivo counterparts. In this study, we utilized the first three passages, which offer optimal cell diversity and viability, to meet experimental needs. However, replicating and maintaining the full complexity of endometrial cell types in vitro remains a major challenge in the field—one that we are actively working to address.

      Author response table 1.

      Methodological Optimization of Endometrial Organoids (Construction, Passaging, and Cryopreservation)

      Author response table 2.

      Optimization and comparison of endometrial organoid culture media

      Author response image 2.

      Bright-field microscopy captures the expansion of glands and surrounding stromal cells across passages 0 to 2 (scar bar=200μm) (Yellow arrows: stromal cells; White arrows: glands).

      (2) The paper is still much too dense, touching upon all kind of conclusions from the manifold bioinformatic analyses. The latter should be much clearer and better described, and then some interesting findings (pathways/genes) should be highlighted without mentioning every single aspect that is observed. The paper needs a lot of editing to better focus and extract take-home messages, not bombing the reader with a mass of pathways, genes etc which makes the manuscript just not readable or 'digest-able'. There is no explanation whatever and no clear rationale why certain genes are included in a list while others are not. There is the impression that mass bioinformatics is applied without enough focus.

      Thanks for your suggestions. We have made improvements and revisions in the following areas:

      (a) Clarity and Better Description of Bioinformatic Analyses: We have revised the sections involving bioinformatic analyses to provide a more streamlined and comprehensible explanation. Instead of overwhelming the reader with excessive details, we focused on the most important findings.

      (b) Organizing the Logic and Highlighting Key Findings: We have revised the manuscript to emphasize key findings according to the following logic: constructing a receptive endometrial organoid, comparing its molecular characteristics with those of the receptive endometrium, highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the function involved in embryo interaction.

      (c) Rationale for Gene Selection: We have clarified the rationale for selecting certain genes and pathways for inclusion in the analysis and manuscript.

      We hope these revisions address your concerns and improve the overall quality and clarity of the manuscript. Thank you once again for your valuable input.

      (3) The study is much too descriptive and does not show functional validation or exploration (except glycogen production). Some interesting findings extracted from the bioinformatics must be functionally tested.

      Thanks for your suggestions. We have restructured the logic and revised the manuscript, incorporating functional validation. The focus is on the following points: highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the functions involved in embryo interaction.

      (4) In contrast to what was found in vivo (Wang et al. 2020), no abrupt change in gene expression pattern is mentioned here from the (early-) secretory to the WoI phase. Should be discussed. Although the bioinformatic analyses point into this direction, there are major concerns which must be solved before the study can provide the needed reliability and credibility for revision.

      To further investigate the abrupt change, the Mfuzz algorithm was utilized to analyze gene expression across the three groups, focusing on gene clusters that were progressively upregulated or downregulated. It was observed that mitochondrial and cilia-related genes exhibited the highest expression levels in WOI endometrial organoids, as well as cell junction and negative regulation of cell differentiation were downregulated (Figure 4A).

      (5) All data should be benchmarked to the Wang et al 2020 and Garcia-Alonso et al. 2021 papers reporting very detailed scRNA-seq data, and not only the Stephen R. Quake 2020 paper.

      We appreciate your suggestion. By integrating data from Garcia-Alonso et al. (2021) (shown in the figure below), we observed that both WOI organoids and SEC organoids exhibit increased glandular secretory epithelium and developed ciliated epithelium, mirroring features of mid-secretory endometrium. The findings exhibit parallels when contrasting these two papers.

      Author response image 3.

      UMAP visualization of integrated scRNA-seq data (our dataset and Garcia-Alonso et al. 2021) showing: (A) cell types, (B) WOI-org, (C)CTRL-org, (D)SEC-org versus published midsecretory samples.

      (6) Fig. 2B: Vimentin staining is not at all clear. F-actin could be used to show the typical morphology of the stromal cells?

      We appreciate your suggestion. We performed additional staining for F-actin based on Vimentin, and found that Vimentin+ F-actin+ cells (stromal cells) were arranged around the glandular spheres that were only F-actin+.

      (7) Where does the term "EMT-derived stromal cells" come from? On what basis has this term been coined?

      Within endometrial biology, stromal cells in the transition from epithelial to mesenchymal phenotype are specifically referred to as 'stromal EMT transition cells' (PMID: 39775038, PMID: 39968688).

      In certain cancers or fibrotic diseases, epithelial cells can transition into a mesenchymal phenotype, contributing to the stromal environment that supports tumor growth or tissue remodeling (PMID: 20572012).

      (8) CD44 is shown in Fig. 2D but the text mentions CD45 (line 159)?

      In Fig 2D, T cells are defined as a cluster of CD45+CD3+ cells, further subdivided into CD4+ and CD8+ T cells based on their expression of CD4 and CD8. This figure does not include data on CD44.

      (9) All quantification experiments (of stainings etc) should be in detail described how this was done. It looks very difficult (almost not feasible) when looking at the provided pictures to count the stained cells.

      a. Manual Measurement:

      For TEM-observed pinopodes, glycogen particles, microvilli, and cilia, manual region-of-interest (ROI) selection was performed using ImageJ software for quantitative analysis of counts, area, and length. Twenty randomly selected images per experimental group were analyzed for each morphological parameter.

      b. Automated Measurement:

      We quantified the fluorescence images using ImageJ. Firstly, preprocess them by adjusting brightness and contrast, and removing background noise with the “Subtract Background” feature.

      Secondly, set the threshold to highlight the cells, then select the regions of interest (ROI) using selection tools. Thirdly, as for counting the cells, navigate to Analyze > Analyze Particles. AS for measuring the influence intensity and area, set the “Measurement” options as mean gray value. Adjust parameters as needed, and view results in the “Results” window. Save the data for further analysis and ensure consistency throughout your measurements for reliable results.

      For 3D fluorescence quantification, ZEN software (Carl Zeiss) was exclusively used, with 11 images analyzed per experimental group. This part has been incorporated into “Supporting Information”

      Line 94-100.

      c. Normalization Method:

      For fluorescence quantification, DAPI was used as an internal reference for normalization, where both DAPI and target fluorescence channel intensities were quantified simultaneously. The normalized target signal intensity (target/DAPI ratio) was then compared across experimental groups. A minimum of 15 images were analyzed for each parameter per group. This part has been incorporated into “Supporting Information” Line 101-104.

      (10) Fig. 3C: it is unclear how quantification can be reliably done. Moreover, OLFM4 looks positive in all cells of Ctrl, but authors still see an increase?

      (a) Fluorescence images were quantitatively analyzed using ImageJ by measuring the mean gray values. For normalization, DAPI staining served as an internal reference, with simultaneous measurement of mean gray values in both the target fluorescence channel and the DAPI channel. The relative fluorescence intensity was then calculated as the ratio of target channel to DAPI signal for inter-group quantitative comparisons.

      (b) OLFM4 is an E2-responsive gene. Its expression in endometrial organoids of the CTRL group is physiologically normal (PMID: 31666317). However, its fluorescence intensity (quantified as mean gray value) was significantly stronger in both the SEC and WOI groups compared to the CTRL group (quantitative method as described above).

      (11) Fig. 3F: Met is downregulated which is not in accordance with the mentioned activation of the PI3K-AKT pathway.

      We appreciate your careful review. Our initial description was imprecise. In the revised manuscript, this statement has been removed entirely.

      (12) Lines 222-226: transcriptome and proteome differences are not significant; so, how meaningful are the results then? Then, it is very hard to conclude an evolution from secretory phase to WoI.

      We appreciate your feedback. The manuscript has been comprehensively revised, and the aforementioned content has been removed.

      (13) WoI organoids show an increased number of cilia. However, some literature shows the opposite, i.e. less ciliated cells in the endometrial lining at WoI (to keep the embryo in place). How to reconcile?

      Thank you for raising this question. We conducted a statistical analysis of the proportion of ciliated cells across endometrial phases.

      (a) Based on the 2020 study by Stephen R. Quake and Carlos Simon’s team published in Nature Medicine (PMID: 32929266), the mid-secretory phase (Days 19–23) exhibited a higher proportion of ciliated cells compared to the early-secretory (Days 15–18) and late-secretory phases (Days 24– 28) (Fig. R13 A).

      (b) According to the 2021 study by Roser Vento-Tormo’s team in Nature Genetics, ciliated cell abundance peaked in the early-to-mid-secretory endometrium across all phases (Fig. R13 B-C).

      Data were sourced from the Reproductive Cell Atlas.

      (14) How are pinopodes distinguished from microvilli? Moreover, Fig. 3 does not show the typical EM structure of cilia.

      Thank you for this insightful question.

      (a) Pinopodes are large, bulbous protrusions with a smooth apical membrane. Under transmission electron microscopy (TEM), it can be observed that the pinopodes contain various small particles, which are typically extracellular fluid and dissolved substances.

      Microvilli are elongated, finger-like projections that typically exhibit a uniform and orderly arrangement, forming a "brush border" structure. Under transmission electron microscopy, dense components of the cytoskeleton, such as microfilaments and microtubules, can be seen at the base of the microvilli.

      (b) You may refer to the ciliated TEM structure shown in the current manuscript's Fig. 2E (originally labeled as Fig. 2H in the draft). The cilium is composed of microtubules. The cross-section shows that the periphery of the cilium is surrounded by nine pairs of microtubules arranged in a ring. The longitudinal section shows that the cilium has a long cylindrical structure, with the two central microtubules being quite prominent and located at the center of the cilium.

      (15) There is a recently published paper demonstrating another model for implantation. This paper should be referenced as well (Shibata et al. Science Advances, 2024).

      Thanks for your valuable comments. We have cited this reference in the manuscript at Line 77-78.

      (16) Line 78: two groups were the first here (Turco and Borreto) and should both be mentioned.

      Thanks for your valuable comments. We have cited this reference in the manuscript at Line 74-76.

      (17) Line 554: "as an alternative platform" - alternative to what? Authors answer reviewers' comments by just changing one word, but this makes the text odd.

      Thank you for your review. Here, we propose that this WOI organoid serves as an alternative research platform for studying endometrial receptivity and maternal-fetal interactions, compared to current secretory-phase organoids. In the revised manuscript, we have supplemented the data by co-culturing this WOI organoid with blastoid, demonstrating its robust embryo implantation potential.

      Reviewer #2 (Public Review):

      In this research, Zhang et al. have pioneered the creation of an advanced organoid culture designed to emulate the intricate characteristics of endometrial tissue during the crucial Window of Implantation (WOI) phase. Their method involves the incorporation of three distinct hormones into the organoid culture, coupled with additives that replicate the dynamics of the menstrual cycle. Through a series of assays, they underscore the striking parallels between the endometrial tissue present during the WOI and their crafted organoids. Through a comparative analysis involving historical endometrial tissue data and control organoids, they establish a system that exhibits a capacity to simulate the intricate nuances of the WOI. The authors made a commendable effort to address the majority of the statements. Developing an endometrial organoid culture methodology that mimics the window of implantation is a game-changer for studying the implantation process. However, the authors should strive to enhance the results to demonstrate how different WOI organoids are from SEC organoids, ensuring whether they are worth using in implantation studies, or a proper demonstration using implantation experiments.

      Thank you for your valuable suggestions. The WOI organoids differ from SEC organoids in the following aspects.

      (1) Structurally, WOI endometrial organoids exhibit subcellular features characteristic of the implantation window: densely packed pinopodes on the luminal side of epithelial cells, abundant glycogen granules, elongated and tightly arranged microvilli, and increased cilia (Figure 2F).

      (2) At the molecular level, WOI organoids show enlarged and functionally active mitochondria, enhanced ciliary assembly and motility, and single-cell signatures resembling mid-secretory endometrium.

      Specifically, mitochondrial- and cilia-related genes/proteins are most highly expressed in WOI organoids (Figure 4A,B). TEM analysis revealed that WOI organoids have the largest average mitochondrial area (Figure 4C). Mitochondrial-related genes display an increasing trend across the three organoid groups, and WOI organoids produce more ATP and IL-8 (Figure 4D,E).

      For cilia, WOI organoids upregulate genes/proteins involved in ciliary assembly, basal bodies, and motile cilia, while downregulating non-motile cilia markers (Figure 5A-C).

      Single-cell analysis further confirms that WOI organoids recapitulate mid-secretory endometrial traits in mitochondrial metabolism and cell adhesion (Figure 2G).

      (3) Functionally, WOI organoids demonstrate superior embryo implantation potential. Given the scarcity and ethical constraints of human embryos, we used blastoids for implantation assays (Figure 6A). These blastoids successfully grew within endometrial organoids, established interactions (Figure 6B), and exhibited normal trilineage differentiation (epiblast: OCT4; hypoblast: GATA6; trophoblast: KRT18) (Figure 6C). WOI organoids achieved significantly higher blastoid survival (66% vs. 19% in CTRL and 28% in SEC) and interaction rates (90% vs. 47% in CTRL and 53% in SEC), confirming their robust embryo-receptive capacity (Figure 6D,E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In conclusion, it is needed to first meet all the concerns of the reviewers and then submit an appropriately adapted and comprehensive paper (also showing the robustness of the "organoids" and functionality of the findings) instead of this still fully descriptive paper. Further comments are included in the rebuttal document of the authors and will be provided by the editor as PDF.

      Reviewer #2 (Recommendations For The Authors):

      The authors made a good effort to reply all the statements. However, there are some points that the authors need to address.

      • There is an inconsistency in the manuscript regarding the number of passages in which the organoids are used; in the response to the reviewers, it mentions 5 passages, while in the Materials and Methods section, it states 3 passages.

      We sincerely appreciate your thorough review. In this study, organoids within the first three passages were used. To address the reviewer's question comprehensively, we have now provided a detailed account of the organoid passage history in our response.

      • We agree that the difference between SEC and WOI organoids may be subtle, but in response to this, the authors should explain what they mean by "the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids..."

      In the original manuscript, this statement indicated that, at the single-cell level, WOI endometrial organoids exhibited more functionally mature and thoroughly differentiated characteristics compared to SEC endometrial organoids (See details below).

      In the revised version, we have restructured this section to focus on following aspects: hormone response, energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition, and embryo implantation potential. Consequently, the "the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids..."has been removed.

      (1) Varied cellular functions:

      a. Secretory Epithelium: Compared to SEC organoids, WOI organoids exhibit enhanced peptide metabolism and mitochondrial energy metabolism in their secretory epithelium, supporting endometrial decidualization and embryo implantation (Figure 3F).

      b. Proliferative Epithelium: Compared to SEC organoids, WOI organoids demonstrate enhanced GTPase activity, angiogenesis, cytoskeletal assembly, cell differentiation, and RAS protein signaling in their proliferative epithelium (Figure S2G).

      c. Ciliated Epithelium: The ciliated epithelium of WOI endometrial organoids is associated with the regulation of vascular development and exhibits higher transcriptional activity compared to SEC organoids (Figure 5E).

      d. Stromal Cells: Compared to SEC organoids, WOI organoids exhibit enhanced cell junctions, cell migration, and cytoskeletal regulation in EMT-derived stromal cells (Figure S4A right panel). Similarly, cell junctions are also strengthened in stromal cells (Figure S4A left panel).

      (2) comprehensive differentiation:

      a. Compared to SEC organoids, WOI organoids exhibit more complete differentiation from proliferative epithelium to secretory epithelium (Figure 3G).

      b. The WOI organoids demonstrate more robust ciliary differentiation compared to SEC organoids (Figure 5F).

      c. The proliferative epithelium progressively differentiates into EMT-derived cells. Compared to SEC organoids, WOI organoids are predominantly localized at the terminal end of the differentiation trajectory, indicating more complete differentiation (Figure S4B).

      • What do the authors mean by "average intensity" when referring to the extra reagents added to the WOI? The results that the authors show in response to Reviewer 2's Q1 must be included as part of the results and explain how it was done in the materials and methods section.

      This parameter indicates the growth status of organoids. It measures the gray value of organoids through long-term live-cell tracking. When organoids undergo apoptosis, they progressively condense into denser solid spheres, leading to an increase in gray value (average intensity). This content has been incorporated into the Results section (Line 129) and is further explained in the Supporting Information "Materials and Methods" (Lines 70-77).

      • In panel 1C, it is not possible to see the stromal cells around because they are brightfield images.

      You are partly right. Bright-field images alone indeed make it difficult to distinguish stromal cells. However, by combining whole-mount immunofluorescence staining with the characteristic elongated spindle-shaped morphology of stromal cells, we were able to roughly determine their distribution in the bright-field images.

      • Responding to Reviewer 2's question Q7, the authors indicate how they establish the cluster. However, they do not specify whether they extrapolate the data from a database or create the cluster themselves based on the literature. It should be stated from which classification list (or classification database) the extrapolation has been made.

      Within endometrial biology, stromal cells in the transition from epithelial to mesenchymal phenotype are specifically referred to as 'stromal EMT transition cells' (PMID: 39775038, PMID: 39968688).

      In certain cancers or fibrotic diseases, epithelial cells can transition into a mesenchymal phenotype, contributing to the stromal environment that supports tumor growth or tissue remodeling (PMID: 20572012).

      • Regarding Reviewer 2's question Q8, if the authors have not been able to make comparisons with, at least, SEC organoids, unfortunately, the ERT loses much of its strength and should not serve as support.

      We agree with you at this point. These results have been moved to the supplementary figures.

      • If the differences in the transcriptome and proteome between SEC and WOI organoids are not significant, the result does not support the authors' model. If there are barely any differences at the proteome and transcriptome level between SEC and WOI organoids, why would anyone choose to use their model over SEC organoids?

      We sincerely appreciate your valuable feedback. In this revised manuscript, we have further integrated transcriptomic and proteomic analyses, revealing that WOI organoids exhibit enlarged and functionally active mitochondria, along with enhanced cilia assembly and motility compared to SEC organoids. Additionally, using a blastoid model, we demonstrated that WOI organoids possess superior embryo implantation potential, significantly outperforming SEC organoids. Our research group aims to develop an embryo co-culture model. Through systematic comparisons of structural, molecular, and co-culture characteristics between SEC and WOI organoids, we ultimately confirmed the superior performance of WOI organoids.

      • SEC and WOI organoids must be different enough to establish a new model, and the authors do not demonstrate that they are.

      Thank you for your valuable feedback. In the revised manuscript, we have emphasized the distinctions between SEC and WOI organoids in terms of structure, molecular characteristics, and functionality (co-culture with blastoid), as detailed below.

      (1) Structurally, WOI endometrial organoids exhibit subcellular features characteristic of the implantation window: densely packed pinopodes on the luminal side of epithelial cells, abundant glycogen granules, elongated and tightly arranged microvilli, and increased cilia (Figure 2F).

      (2) At the molecular level, WOI organoids show enlarged and functionally active mitochondria, enhanced ciliary assembly and motility, and single-cell signatures resembling mid-secretory endometrium.

      Specifically, mitochondrial- and cilia-related genes/proteins are most highly expressed in WOI organoids (Figure 4A,B). TEM analysis revealed that WOI organoids have the largest average mitochondrial area (Figure 4C). Mitochondrial-related genes display an increasing trend across the three organoid groups, and WOI organoids produce more ATP and IL-8 (Figure 4D,E).

      For cilia, WOI organoids upregulate genes/proteins involved in ciliary assembly, basal bodies, and motile cilia, while downregulating non-motile cilia markers (Figure 5A-C).

      Single-cell analysis further confirms that WOI organoids recapitulate mid-secretory endometrial traits in mitochondrial metabolism and cell adhesion (Figure 2G).

      (3) Functionally, WOI organoids demonstrate superior embryo implantation potential. Given the scarcity and ethical constraints of human embryos, we used blastoids for implantation assays (Figure 6A). These blastoids successfully grew within endometrial organoids, established interactions (Figure 6B), and exhibited normal trilineage differentiation (epiblast: OCT4; hypoblast: GATA6; trophoblast: KRT18) (Figure 6C). WOI organoids achieved significantly higher blastoid survival (66% vs. 19% in CTRL and 28% in SEC) and interaction rates (90% vs. 47% in CTRL and 53% in SEC), confirming their robust embryo-receptive capacity (Figure 6D,E).

      • Regarding Q16, Boretto et al. 2017 and Turco et al. 2017 also manage to isolate stromal cells, but they lose them between passages. It's not a matter of isolating them from the tissue or not, but rather how they justify their maintenance in culture. In the images added by the authors, it can be seen that the majority of stromal cells are lost from P0 to P1 after thawing. I still believe that the epithelial part can be passed and maintained, but the rest cannot, and that should be mentioned in the paper as a limitation. However, the authors can demonstrate the maintenance of stromal cells by performing immunostaining with vimentin from passages 4, 5, and 6.

      Thank you for your valuable comments. We have added the statement 'Stromal cells and immune cells are difficult to pass down stably and their proportion is lower than that in the in vivo endometrium' to the Limitations section (Line 364-365). Additionally, we performed immunostaining with vimentin starting from passage 6 and confirmed the presence of Vimentin+ F-actin+ stromal cells (as shown in Author response image 1).

    1. eLife Assessment

      This work demonstrates an objective way to select parameter values for a quadratic integrate-and-fire model so that its bifurcation diagram matches a specific target diagram, generated from the Wang-Buzsaki model. The method is useful for the field and is presented with convincing evidence. The method is currently limited in its ability to be applied to data, but improves our mathematical tools to treat a rarely studied type of bifurcation.

    2. Reviewer #1 (Public review):

      Summary:

      From a big picture viewpoint, this work aims to provide a method to fit parameters of reduced models for neural dynamics so that the resulting tuned model has a bifurcation diagram that matches that of a more complex, computationally expensive model. The matching of bifurcation diagrams ensures that the model dynamics agree on a region of parameter space, rather than just at specially tuned values, and that the models share properties such as qualitative features of their phase response curves, as the authors demonstrate. A notable point is the inclusion of extracellular potassium concentration dynamics into the reduced model - here, the quadratic integrate-and-fire model; this is straightforward but nonetheless useful for studying certain phenomena.

      Strengths:

      The paper demonstrates the method specifically on the fitting of the quadratic integrate-and-fire model, with potassium concentration dynamics included, to the Wang-Buzsaki model extended to include the potassium component. The method works very well overall in this instance. The resulting model is thoroughly compared with the original, in terms of bifurcation diagrams, production of various activity patterns, phase response curves, and associated phase-locking and synchronization properties.

      Weaknesses:

      It is important to note that the proposed method requires that a target bifurcation diagram be known. In practical terms, this means that the method may be well suited to fitting a reduced model to another, more complicated model, but is not likely to be useful for fitting the model to data. Certainly, the authors did not illustrate any such application. Secondly, the authors do not provide any sort of general algorithm but rather give a demonstration of a single example of fitting one specific reduced model to one specific conductance-based model. Finally, the main idea of the paper seems to me to be a natural descendant of the chain of reasoning, starting from Rinzel - continuing through Bertram; Golubitsky/Kaper/Josic; Izhikevich; and others - that a fundamental way to think about neuronal models, especially those involving bursting dynamics, is in terms of their bifurcation structure. According to this line of reasoning, two models are "the same" if they have the same bifurcation structure. Thus, it becomes natural to fit a reduced model to a more complicated model based on the bifurcation structure. The authors deserve credit for recognizing and implementing this step, and their work may be a useful example to the community. But the manuscript should have described and cited this chain of works to put the current study in the correct context.

    3. Reviewer #2 (Public review):

      Summary:

      The authors derive an integrate-and-fire model to describe the dynamics of a more complex Wang-Buzsaki model and compare the two models. A detailed discussion of bifurcation schemes in both models is convincing and allows us to evaluate the simpler model.

      Strengths:

      The idea is interesting, and the mathematical approach appears to be convincing. In addition, differences between the simple and original models are also discussed.

      Weaknesses:

      A comparison to experimental data is necessary to support the theoretical work.

    4. Author response:

      We thank the reviewers for their constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by (i) providing an additional example of our method’s implementation and (ii) framing our contribution more clearly as a continuation of the line of research that characterises neuronal models in terms of their bifurcation structure.

      Experimental validation, however, is beyond the scope of this study. Constructing experimental bifurcation diagrams remains a major challenge, particularly for unstable branches. Although some techniques exist to approximate branches of unstable steady states, unstable limit cycles are far more difficult to capture. Additionally, in practice, many factors vary during recordings, and generating reliable diagrams would require a large number of tightly controlled experimental repetitions whose stability often cannot be ensured. Two-dimensional bifurcation diagrams, as needed for the analysis in our manuscript, are even more challenging to obtain because the extensive and stable recordings would have to be available from the same cell at different values of the second parameter (such as different extracellular potassium concentrations). At this stage, our method can be applied to the reduction of detailed conductance-based models, which themselves are constrained by experimental data (for example, gating functions fitted to voltage-clamp recordings). This way, simple yet dynamically faithful phenomenological models for efficient use in network analysis and simulation can be derived from more complex, biophysical models. In contrast to the traditional voltage fitting approach, these models can also capture changes in additional parameters (such as extracellular potassium concentration).

    1. eLife Assessment

      This study presents a valuable finding on whether executive resources mediate the impact of language predictability in reading in the context of aging. The presentation of evidence is incomplete; further conceptual clarifications, methodological details, and addressing potential confounds would strengthen the study. The work will be of interest to cognitive neuroscientists working on reading, language comprehension, and executive control.

    2. Reviewer #1 (Public review):

      This manuscript reports a dual-task experiment intended to test whether language prediction relies on executive resources, using surprisal-based measures of predictability and an n-back task to manipulate cognitive load. While the study addresses a question under debate, the current design and modeling framework fall short of supporting the central claims. Key components of cognitive load, such as task switching, word prediction vs integration, are not adequately modeled. Moreover, the weak consistency in replication undermines the robustness of the reported findings. Below unpacks each point.

      Cognitive load is a broad term. In the present study, it can be at least decomposed into the following components:

      (1) Working memory (WM) load: news, color, and rank.

      (2) Task switching load: domain of attention (color vs semantics), sensorimotor rules (c/m vs space).

      (3) Word comprehension load (hypothesized against): prediction, integration.

      The components of task switching load should be directly included in the statistical models. Switching of sensorimotor rules may be captured by the "n-back reaction" (binary) predictor. However, the switching of attended domains and the interaction between domain switching and rule complexity (1-back or 2-back) were not included. The attention control experiment (1) avoided useful statistical variation from the Read Only task, and (2) did not address interactions. More fundamentally, task-switching components should be directly modeled in both performance and full RT models to minimize selection bias. This principle also applies to other confounding factors, such as education level. While missing these important predictors, the current models have an abundance of predictors that are not so well motivated (see later comments). In sum, with the current models, one cannot determine whether the reduced performance or prolonged RT was due to affecting word prediction load (if it exists) or merely affecting the task switching load.

      The entropy and surprisal need to be more clearly interpreted and modeled in the context of the word comprehension process. The entropy concerns the "prediction" part of the word comprehension (before seeing the next word), whereas surprisal concerns the "integration" part as a posterior. This interpretation is similar to the authors writing in the Introduction that "Graded language predictions necessitate the active generation of hypotheses on upcoming words as well as the integration of prediction errors to inform future predictions [1,5]." However, the Results of this study largely ignored entropy (treating it as a fixed effect) and only focus on surprisal without clear justification.

      In Table S3, with original and replicated model fitting results, the only consistent interaction is surprisal x age x cognitive load [2-back vs. Reading Only]. None of the two-way interactions can be replicated. This is puzzling and undermines the robustness of the main claims of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      This paper considers the effects of cognitive load (using an n-back task related to font color), predictability, and age on reading times in two experiments. There were main effects of all predictors, but more interesting effects of load and age on predictability. The effect of load is very interesting, but the manipulation of age is problematic, because we don't know what is predictable for different participants (in relation to their age). There are some theoretical concerns about prediction and predictability, and a need to address literature (reading time, visual world, ERP studies).

      Strengths/weaknesses

      It is important to be clear that predictability is not the same as prediction. A predictable word is processed faster than an unpredictable word (something that has been known since the 1970/80s), e.g., Rayner, Schwanenfluegel, etc. But this could be due to ease of integration. I think this issue can probably be dealt with by careful writing (see point on line 18 below). To be clear, I do not believe that the effects reported here are due to integration alone (i.e., that nothing happens before the target word), but the evidence for this claim must come from actual demonstrations of prediction.

      The effect of load on the effects of predictability is very interesting (and also, I note that the fairly novel way of assessing load is itself valuable). Assuming that the experiments do measure prediction, it suggests that they are not cost-free, as is sometimes assumed. I think the researchers need to look closely at the visual world literature, most particularly the work of Huettig. (There is an isolated reference to Ito et al., but this is one of a large and highly relevant set of papers.)

      There is a major concern about the effects of age. See the Results (161-5): this depends on what is meant by word predictability. It's correct if it means the predictability in the corpus. But it may or may not be correct if it refers to how predictable a word is to an individual participant. The texts are unlikely to be equally predictable to different participants, and in particular to younger vs. older participants, because of their different experiences. To put it informally, the newspaper articles may be more geared to the expectations of younger people. But there is also another problem: the LLM may have learned on the basis of language that has largely been produced by young people, and so its predictions are based on what young people are likely to say. Both of these possibilities strike me as extremely likely. So it may be that older adults are affected more by words that they find surprising, but it is also possible that the texts are not what they expect, or the LLM predictions from the text are not the ones that they would make. In sum, I am not convinced that the authors can say anything about the effects of age unless they can determine what is predictable for different ages of participants. I suspect that this failure to control is an endemic problem in the literature on aging and language processing and needs to be systematically addressed.

      Overall, I think the paper makes enough of a contribution with respect to load to be useful to the literature. But for discussion of age, we would need something like evidence of how younger and older adults would complete these texts (on a word-by-word basis) and that they were equally predictable for different ages. I assume there are ways to get LLMs to emulate different participant groups, but I doubt that we could be confident about their accuracy without a lot of testing. But without something like this, I think making claims about age would be quite misleading.

    4. Author response:

      Reviewer #1 (Public review):

      Cognitive Load and Task-Switching Components:

      We agree that cognitive load is multi-faceted and encompasses dimensions not fully captured in our present models, including domain and rule switching. For the revision, we will explicitly model these components in the statistical analyses by incorporating predictors reflecting attended domain switching and rule complexity, as suggested. We will also explain our inclusion of n-back reaction predictors and justify their relationship with theoretical constructs of executive function. Full details of coding schemes will be provided.

      Modeling Entropy and Surprisal:

      We appreciate the reviewer’s suggestion to further explain the distinction between entropy (predictive uncertainty) and surprisal (integration difficulty), and acknowledge that our treatment of entropy warrants extension. In the revision, we will expand the results and discussion on entropy, providing clearer theoretical motivation for its inclusion and conducting supplementary analyses to examine its role alongside surprisal.

      Replicability of Findings:

      We note the concern regarding two-way vs. three-way interactions in model replication. In the revised manuscript, we will report robustness analyses on subsets of our data (e.g., matched age and education groups), clarify degrees of freedom and group sizes, and transparently report any discrepancies.

      Predictors and Statistical Modeling:

      We will add clarifications on predictor selection, data structure, and rationale for model hierarchy. The functions of d-prime, comprehension accuracy, and performance modeling will be described in more detail, including discussion of block-level vs. participant-level effects.

      Reviewer #2 (Public review):

      Distinction Between Prediction and Predictability:

      We recognize the importance of clearly communicating the difference between prediction and predictability, as well as integration-based vs. prediction-based effects. We will clarify these distinctions throughout the introduction, methods, and discussion sections, citing the relevant theoretical literature (e.g., Pickering & Gambi 2018; Federmeier 2007; Staub 2015; Frisson 2017).

      Aging, Corpus Predictability, and Individual Differences:

      We appreciate the critical point regarding age, corpus-based predictability, and potential cohort effects in language model estimates. In the revision, we will provide conceptual clarifications on how surprisal and entropy might differ for different age groups and discuss limitations in extrapolating these metrics to participant-specific predictions. The limitations inherent in relying on LLM-derived estimates and text materials will be more directly addressed.

      Coverage of Literature and Paradigms:

      We will broaden the literature review as requested, particularly on the N400 effects and behavioral traditions in prediction research. These additions should help contextualize the present work within both neuroscience and psycholinguistics.

      Experimental Context and Predictability Metrics:

      We will address concerns regarding the context window for prediction estimation, describing more precisely how context was defined and whether broader textual cues may improve predictability metrics.

      References

      Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychol. Bull., 144(10), 1002–1044.

      Federmeier, K.D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505.

      Frisson, S. (2017). Can prediction explain the lexical processing advantage for short words? J. Mem. Lang., 95, 121–138.\

      Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Lang. Linguist. Compass, 9(8), 311–327.Huettig, F. & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Trends Cogn. Sci., 20(10), 484–492.We appreciate the reviewers’ constructive comments and believe their suggestions will meaningfully strengthen the paper. Our planned revisions will address each of the above points with additional analyses, clarifications, and expanded discussion.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and showed that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The findings are convincing, supported by transcriptome analysis, histology, and proliferation assays. This study is important, though the genetic tools employed may not fully capture Ptbp1's potential role during the earliest stages of retinal development.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronal-enriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

    1. eLife Assessment

      This valuable study presents a mechanistic model of predictive coding by medial entorhinal cortex grid cells, implemented with biologically detailed conductance-based neurons. The evidence supporting the emergence of this coding scheme from specific membrane currents and the anatomical connectivity among inhibitory neurons is solid. However, the justification for the choice of connectivity patterns and other network parameters remains somewhat incomplete. This work will be of interest to neuroscientists working on spatial navigation, circuit dynamics, and neuronal coding.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aim to elucidate the mechanisms by which grid cells in the medial entorhinal cortex generate predictive representations of spatial location. To address this, they built a computational model integrating intrinsic neuronal dynamics with structured network connectivity. Specifically, they combine a conductance-based single-cell model incorporating biologically realistic HCN channels with a continuous attractor network that reflects known properties of grid cell circuitry. Their simulations show that HCN conductance can shift grid fields forward by approximately 5% of their diameter, consistent with experimental observations in layer II grid cells. Additionally, by introducing asymmetry in the connectivity of interneurons, the model produces larger forward shifts, which parallel properties observed in layer III grid cells. Together, these two mechanisms provide a unified framework for explaining layer-specific predictive coding in the entorhinal cortex.

      Strengths:

      A major strength of the study lies in its conceptual contribution. The authors propose two distinct mechanisms to generate forward-shifted grid fields for predictive coding. One mechanism is intrinsic and depends on the time constants associated with HCN channels. The other is network-based and results from asymmetries in interneuron connectivity. These two mechanisms correspond to different observed properties of grid cells in layer II and layer III, respectively. The modeling is based on previously validated frameworks of continuous attractor network models (e.g., Burak & Fiete; Kang & DeWeese), but it incorporates several novel features, including the incorporation of biophysically realistic HCN channels, a network architecture that excludes stellate-stellate connections and relies on interneurons, and asymmetric interneuron connectivity.

      Weaknesses:

      One of the proposed mechanisms for predictive coding, namely asymmetric interneuron connectivity, is a novel idea. However, this type of connectivity has not yet been demonstrated experimentally in the medial entorhinal cortex. Therefore, the biological plausibility of this mechanism remains uncertain and will need to be evaluated in future empirical studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study proposes that predictive spatial representations in medial entorhinal cortex (MEC) grid cells arise through two distinct biophysical mechanisms: (1) HCN conductance-dependent temporal dynamics, which generate modest forward shifts (~5% of grid field diameter) in Layer II cells, and (2) network asymmetry, enabling larger predictive shifts (~25% of grid field diameter) in Layer III cells. The model further predicts a dorsoventral gradient in predictive coding magnitude, correlating with observed HCN conductance variations. These results provide a mechanistic framework for understanding how intrinsic cellular properties and circuit architecture collectively enable prospective spatial coding in the MEC. This is an important study.

      Strengths:

      These findings reveal how cellular properties and circuit design enable prospective spatial coding. This novel, impactful study will be of interest to the field.

      Weaknesses:

      Some of the models are too mathematical and do not fit with the biological observation.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shaikh and Assisi addresses a timely and important question related to the neural circuit mechanisms underlying spatial representations during navigation. Concretely, they present a model of the medial entorhinal cortex (MEC) with biophysically detailed conductance-based stellate cells that can perform path integration and reveal two potential mechanisms underlying two forms of predictive coding by grid cells in the MEC. One mechanism uses HCN channels to explain predictive coding in MEC layer II grid cells equivalent to ~5% of the diameter of a grid field, and the other uses asymmetric connections between interneurons and stellate cells, resulting in a ~25% predictive bias of layer III grid cells. The methods and model are technically sound, and the model is expected to be useful for computational neuroscientists studying the neural mechanisms of spatial navigation.

      Strengths:

      One strength of the model is its use of conductance-based neuron models of stellate cells and interneurons, adding important biophysical constraints and details to existing continuous attractor network models of grid cells. The model fills a gap in the literature by providing mechanisms for predictive coding constrained by biophysical properties of stellate cells and simplified network topology.

      Weaknesses:

      A weakness of the model is that the neural network is relatively small (five sheets with 71 × 71 neurons each), and the 2-D toroidal topology is further simplified to a 1-D ring attractor consisting of three rings with 192 neurons each. The model incorporates biophysical detail at the single-neuron level, but not at the network level. For example, it includes only stellate cells and a generic interneuron type, and does not implement data-driven connectivity patterns.

      The restricted network size and the limited experimental knowledge about connectivity among stellate cells, principal cells, and different interneuron types in the MEC could be addressed in more detail. Moreover, the manuscript lacks a thorough discussion of assumptions common to most continuous attractor network (CAN) models of grid cells, such as the use of "hand-crafted" connections between direction-sensitive conjunctive grid cells and network cells to drive attractor shifts. Including such a discussion would strengthen the manuscript. This is especially relevant given the authors' explicit claim that they have revealed two mechanisms underlying the emergence of a predictive code in the MEC. In this reviewer's view, the work demonstrates a potential mechanism, but one that requires experimental verification. The significance of the model would thus be increased by providing more experimentally testable predictions of the model.

    1. eLife Assessment

      This fundamental study shows how past experiences shape perception across short, medium, and long time scales, using a single behavioural paradigm and reanalysed EEG data. It provides convincing evidence for two processes across all scales: an attention-dependent mechanism that speeds responses to expected events, and an attention-independent mechanism where expected events are encoded less precisely, consistent with feedforward dampening. The work offers a unifying account of temporal context effects, though stronger brain-behaviour links, integration with serial dependence attraction and repulsion models, and extension to other timescale definitions would further strengthen the contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context - at various time scales - affects various psychophysical measures, including reaction times, accuracy and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Comments on revised version:

      I think the authors have dealt adequately with my issues, none of which were fundamental.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The authors have done a good job in their revision, clarifying important points and stating the limitations of the study clearly.

      I also think they made a valid effort to address and correct issues arising from the temporal dependency confound, although I still wonder whether the best approach would have been to design an experiment in a way that avoided this confound in the first place.<br /> Overall, this is a substantially improved version, and I particularly appreciate the clarification and correction regarding the direction of the bias in the EEG data (repulsive rather than attractive).

      Weaknesses:

      These are now relatively minor points.

      I believe this latter aspect, the repulsive bias, may deserve further discussion, especially in relation to their behavioral findings and, in particular, to earlier work proposing multi-stage frameworks of serial dependence, where low-level repulsion interacts with attractive biases at higher-level stages (Fritsche et al., 2020; Pascucci et al., 2019; Sheehan & Serences, 2022). The authors may also consider to cite some key reviews on serial dependence that discuss both repulsion and attraction in forced-choice and reproduction tasks (Manassi et al., 2023; Pascucci et al., 2023).

      Related to this, after finding the opposite pattern, is the sentence in line 472-473 ("Further, we found an attractive...") and the related argument still valid?

      Regarding my earlier point about former line 197 and Figure 3b,c: what I noticed-similar to the patterns reported in the studies I referenced-is that the data cannot be simply described as showing faster and more accurate responses for small deltas. Responses also appear faster and more accurate for very large deltas, with performance being worse in between. Indeed, as the authors state: "The peak in precision for large Deltas locations is consistent with alternate events being encoded more precisely, while the peak for small offsets may be explained by the attractive bias towards the previous target." I wonder whether it is necessary, or unequivocally supported by the data, to hypothesize two separate mechanisms here. An alternative could be interference effects between consecutive stimuli that are neither identical nor completely different-making the previous one more likely to interfere with the current stimulus representation.

      Finally, this is definitely a minor point, but I still find the reply to my comment about the prediction of stable retinal input rather speculative. Such a prediction would seem more plausible in world-centered coordinates.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. eLife Assessment

      Complementing previous work (Namiki et al, 2018), this study provides an important resource for the Drosophila community as it reports 500 lines targeting descending neurons (DN), in addition to compiling 306 existing DN lines from the literature. The compelling work characterizes 146 DNs and makes a critical link with the DNs identified in Electron microscopy (EM). The lines in this paper will be of interest to Drosophila neuroscientists who will be able to use the reported genetic drivers for further functional characterization of DNs and circuit mapping in conjunction with existing EM datasets.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zung et al. describes a curated library of genetic lines labeling a class of important neurons called Descending Neurons in the fruit fly, Drosophila melanogaster. These neurons are especially important in their critical role in relaying information from the brain to motor circuits within the ventral nerve cord - the insect analogy of the vertebrate spinal cord. The authors screened through a vast resource of Gal4 lines to generate 500 new genetic lines that allow for the precise labeling of 190 (40%) of all Descending Neurons. The tools introduced here will allow researchers to perform precise circuit dissection of the exact roles these neurons play in linking the brain to the ventral nerve cord.

      Strengths:

      This manuscript represents an important follow-up to the author's 2018 paper in the extension of the genetic toolkit from 178 genetic lines that target 65 Descending Neuron (DN) classes to 806 lines that target 190 DN classes. The presentation of this toolkit is comprehensive with confocal images, informative classifications of lines based on specificity/consistency, and identification of the neuron types - when possible - in the EM dataset.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Descending neurons (DNs) are critical nodes in the neural computation underlying sensorimotor transformation. Building on their earlier work, the authors have substantially expanded the genetic resources for labeling these cell types in D. melanogaster, offering a valuable public resource.

      Strengths:

      The authors identified 146 additional DN types and generated 500 new DN driver lines, expanding the genetic reagents from labeling 98 cell types to 244, representing approximately 50% of all DN types estimated by EM connectomes. While the EM connectomes offer unprecedented resolution of neuronal cell types and their connectivity, genetic access to these cell types remains essential for studying their functions and testing hypotheses. Given the broad interest in DNs, the reagents generated in this study will be of important value for addressing a wide range of questions in sensorimotor transformation.

      The organization of the dataset is overall intuitive and comprehensive. The authors also provided clear information and guidance on accessing the relevant resources, such as stack images and fly lines. In addition, the authors have thoughtfully handled the information updated from the earlier collection they generated (Namiki et al. 2018) and incorporated previously published DN lines, providing a consolidated and up-to-date resource for the DN community.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides the Drosophila community with a large collection of new split-Gal4 descending neuron genetic lines. They extend previous efforts to characterize and identify genetic lines for this important class of neurons by providing images of descending neurons and a metric for genetic lines based on specificity and consistency. Their discussion highlights several applications of this collection, for example, to understand the function of new descending neurons through optogenetic and/or physiological characterization. They also helpfully discuss caveats, encouraging users of this collection to validate expression patterns and to be careful when interpreting optogenetic experimental results, considering potential off-target labeling in the lines. Overall, members of the Drosophila community interested in understanding the function of descending neurons and their role in behavior will find this a helpful resource.

      Strengths:

      (1) The authors extend the previous genetic access of descending neurons in Drosophila to over 800 split-Gal4 lines and 190 cell types (nearly half of the known population of descending neurons). The authors update and at times correct the previous identification of descending neurons from a previous, large-scale analysis. The authors extend and, at times, correct previous efforts at characterizing these neurons.

      (2) Clear images of descending neurons labeled by new genetic lines are presented in the main figure papers for reference.

      (3) This study classifies lines labeling descending neurons using a quality score to indicate specificity and consistency. They provide this for the entire set of genetic lines, a valuable assessment for researchers interested in targeting these neurons for optogenetic or physiological characterization.

      Weaknesses:

      Although this paper represents a substantial effort and useful contribution to the Drosophila community, a few weaknesses, primarily regarding the specificity and reliability of genetic lines, remain:

      (1) The authors state that optogenetic activation of DN types using the new split-GAL4 lines is expected to reliably activate the target neurons with virtually no off-target effects in the rest of the central nervous system. More data supporting this conclusion, including both qualitative and quantitative anatomical evidence, would strengthen this claim.

      (2) The authors do recommend that researchers using these lines examine expression patterns themselves to evaluate line cleanliness and consistency, but some analysis by the authors would be useful, for example, providing guidelines for best practices to perform this evaluation.

      (3) Changes in expression patterns after several generations are noted by the authors, weakening confidence somewhat in the long-term usefulness of this collection of genetic lines.

    1. eLife Assessment

      This important study presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. Though minor gaps in understanding the compound's precise molecular mechanism of action and its ability to target Mac1 from other coronaviruses remain, the evidence for its effects on SARS-CoV-2 in relevant biological models is compelling.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently the authors present their finding of the efficacy of the inhibitor both on cell culture as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings. In addition, the authors address several aspects of the health impact of coronaviral infections from animal survival, over viral load to histological assessment of lung damage.

      Weaknesses:

      The selection of Targ1 and MacroD2 as off-target human macrodomains is sub-optimal as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity. However, the authors address this issue by providing modeling data that show clashes with AVI-4206 similarly to their models with MacroD2 and TARG1.

      Comments on revisions:

      While the authors have not addressed all my suggestions experimentally, I would like to nevertheless congratulate them on a significantly strengthened manuscript that will provide a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have the reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects. The authors acknowledge this and point it out as a potential avenue for further optimization.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

      Comments on revisions:

      All previous suggestions were addressed. I am satisfied with the author's modifications.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      Although this study is rigorous and the paper is well-written, I have a few concerns that the authors should address before publication.

      (1) Cellular levels of protein ADP-ribosylation should be analyzed using anti-ADPR antibodies following infection, both with and without Mac1 and AVI-4206 treatment. While the authors have provided impressive in vivo data, these experiments could ideally be conducted in mice. However, I would be amenable to these analyses being performed in human airway organoids, as they demonstrate clear phenotypes following AVI-4206 treatment post-infection. For a more in-depth exploration, the authors could consider affinity purifying ADP-ribosylated proteins and identifying them via mass spectrometry. I would find it particularly compelling if this approach revealed components of the NF-kB signaling pathway, given the intriguing results presented in Fig. 5. I am also curious if there are differences in ADP ribosylated proteins when comparing Mac1 KO SARS-C0V-2 to AVI-4206 treatment.

      We note that despite the recent flurry of activity around Mac1, there is a surprising lack of public data on overall ADPr levels or targets. While we will address the literature precedence for PARP14 signals specifically below (Reviewer 2 point (h)) by immunofluorescence, we note that overall levels have not been characterized biochemically previously. Recent PARP14 papers and the ASAP AViDD preprint show changes by immunofluorescence only: and the evidence in that preprint is quite modest - see Figure 7B - https://pmc.ncbi.nlm.nih.gov/articles/PMC11370477/.

      We suspect the difficulty in tracking changes biochemically is due to multiple factors that influence the overall detectability and reproducibility. First, with regard to detectability - it is quite possible that only a small change in the ADPr status of a small number of targets is responsible for the phenotypes in vivo. Virus levels are very low in the organoid system and the variability in ADPr levels from tissue samples from in vivo experiments is high. Given the difficulty in translating back to cellular models, this problem is therefore magnified further. Second, with regard to reproducibility - we observe a great deal of reagent dependence on ADPr signals by Western blot+/- Mac1 expression in both cellular and tissue lysates (including when stimulated with H2O2, interferon, or during viral infection). Similarly, we do not observe reproducible proteins that pulldown with Mac1 when assayed by mass spectrometry. It is quite likely that these issues are a result of tissue/sample preparation that results in a loss of the ADPr modification during preparation (especially for acidic residue modifications). This also explains the reliance on IF assays in the PARP14 literature. A very good discussion of these issues is also contained in this paper: https://doi.org/10.1042/BSR20240986.

      Nonetheless we have attempted one final experiment. Here, we have measured ADPr modification of cellular lysates upon uninfected conditions as well as upon infection with either WT or N40D mutant virus. For all conditions, this was done with or without treatment of cells with 100 μM of AVI-4206. Measurement of ADPr modifications by western blot using a  pan-ADPr antibody revealed a single prominent band with a molecular weight of ~130kDa, that showed a uniform increase in signal upon treatment of cells with AVI-4206 regardless of infection status. While this general trend was also observed with the mono-ADPr antibody, it was not statistically significant in its regulation upon AVI-4206 treatment. We suspect that the major band observed in these western blots is PARP1, as upon enrichment of ADPr proteins from these lysates by Af1521 immunoprecipitation, we find PARP1 to be among the most abundant proteins detected within this molecular weight range. We note that there is a baseline increase in polyADPr detection upon infection of virus with WT Mac1 (relative to uninfected and virus with N40D) and further increase when treated with AVI-4206. This compound-dependent increase is paralleled in the uninfected and N40D conditions. The counterintuitive increase upon WT Mac1 virus infection, which should erase ADPr marks, and the compound-dependent increase in the uninfected condition suggest that there are many indirect effects on ADPr signalling dynamics in this experiment. These results are difficult to reconcile with the specificity profiling of AVI-4206 (Supplementary Figure5: Thermal proteome profiling in A549 cellular lysates). As mentioned above, the lack of consistent signal across reagents for ADPr detection and the timing of monitoring ADPr levels are additional complicating factors.

      We added to the results:

      “However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8).”

      Methods for experiment:

      Calu3 cells were obtained from ATCC and cultured in Advanced DMEM (Gibco) supplemented with 2.5% FBS, 1x GlutaMax, and 1x Penicillin-Streptomycin at 37°C and 5% CO<sub>2</sub>. 5x10<sup>6</sup> cells were plated in 15-cm dishes and media was changed every 2-3 days until the cells were 80% confluent. The cells were treated with INFy 50 ng/mL (R&D Systems) w/without AVI-4206 100 μM. After 6 hours, the cells were infected with WA1 or WA1 NSP3 Mac1 N40D at a multiplicity of infection (MOI) of 1 for 36 hours. The cells were washed with PBS x 3 and scraped in Pierce IP Lysis Buffer (ThermoFisher) containing 1x HALT protease and phosphatase inhibitor mix (ThermoFisher) on ice. The lysate was stored at -80C until further processing.

      The cell lysate was incubated for 5 minutes at room temperature with recombinant benzonase. Following incubation, the lysate was centrifuged at 13,000 rpm at 4°C for 20 minutes, and the supernatant was collected. The samples were then boiled for 5 minutes at 95°C in 1x NuPAGE LDS sample buffer (Invitrogen) with a final concentration of 1X NuPAGE sample reducing agent (Invitrogen). For the detection of ADPr levels in whole-cell lysates, the samples were subjected to SDS-PAGE and Immunoblotting. All primary and secondary antibodies (pan-ADP-ribose antibody (MABE1016, Millipore), Mono-ADP-ribose antibody (AbD33204, Bio-Rad), HRP-conjugated (Cell signaling), used at a 1:1000 dilution were diluted in 5% non-fat dry milk in TBST. Signals were detected by chemiluminescence (Thermo) and visualized using the ChemiDoc XRS+ System (Bio-Rad). Densitometric analysis was performed using Image Lab (Bio-Rad). Quantification was normalized to Actin. The data are expressed as mean ± SD. Statistical differences were determined using an unpaired t-test in GraphPad Prism 10.3.1.

      (2) SARS-CoV-2 escape mutants for AVI-4206 should be generated, sequenced, and evaluated for both ADP-ribosyl hydrolase activity and their susceptibility to inhibition by AVI-4206.

      We thank the reviewer for this suggestion. These are indeed key experiments which are currently hampered by the lack of a cell line that is fully responsive to drug treatment. Although infected organoids and macrophages show an effect in response to AVI-4206, viral levels are ~3 logs lower than in cell lines and difficult to sequence. In the absence of a system that would allow meaningful screening for outgrowth of resistant viruses, we have conducted mass spectrometry studies that showed that Mac1 is the only significant hit for AVI-4206 (SupplementaryFigure 5). The suggested outgrowth experiments will be conducted once a responsive cell line model has been established.

      (3) Given that Mac1 is found in several coronaviruses, it would be insightful for the authors to test a selection of Mac1 homologs from divergent coronaviruses to assess whether AVI-4206 can inhibit their activity in vitro.

      As mentioned above, inconsistencies in ADPr staining limit our ability to directly measure cellular activity. As an alternative approach to measure AVI-4206 selectivity in cells, we have adapted our CETSA assay for SARS-1 and MERs macrodomain proteins and find evidence that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1. In line with MERS being more structurally divergent than SARS-1 from SARS CoV2, the ΔTagg for SARS-1 and MERS are 4℃ and 1℃, respectively, compared to 9℃ for Mac1.  These data have been added as Supplementary Fig S3C. Development of broader spectrum pan-inhibitors is on our radar for future work which will more thoroughly assess homologs from divergent coronaviruses.

      We added the following sentence to the main results:

      “Encouragingly, we were also able to adapt our CETSA assay for SARS-1 and MERs macrodomain proteins and find that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1 (Supplementary Figure 3C).”

      We also added this supplementary figure 3:

      Minor

      (1) Line 88, "respectively.heir potency"

      Fixed, thank you!

      (2) Line 149 add a period after proteome

      Fixed, thank you!

      Reviewer #2 (Recommendations for the authors):

      (a) The authors assess inhibition of MacroD2 and Targ1 as of-targets for AVI-4206. However, Mac1 belongs to the MacroD-type class of macrodomains of which MacroD1, MacroD2 and MOD1s of PARP9 and PARP14 are the human members. In contrast Targ1 belongs to the ALC1-like class, which is only very distantly related to Mac1. Furthermore, recent studies have shown that the first macrodomains of PARP9 and PARP4 (MOD1 of PARP9/14) are much closer related to Mac1 and PARP9/14 were implicated in antiviral immunity. As such the authors should include assays showing the activity of their compounds against MacroD1 and MOD1s of PARP9/14.

      We emphasize that we detect no significant shift for any protein other than Mac1 in A549 cells by CETSA-MS (Supplementary Figure 6). For Mac1 CESTA, we see an average of 6 PARP14 spectral counts across conditions and did not detect PARP9.  In addition, for separate work in MPro, we ran similar CETSA experiments where we observed an average of 2 PARP9 and 15 PARP14 spectral counts across conditions. Although PARP9 and PARP14 massively increase expression upon IFN treatment in A549 cells, both proteins have been detected by Western Blot in A549 cells previously at baseline.

      Nonetheless, we have included modeling of more diverse macrodomains as a supplemental figure and added to the text:

      Modeling of other diverse macrodomains, including those within human PARP9 and PARP14 further suggests that AVI-4206 is selective for Mac1 (Supplementary Figure 4)

      (b) In the context of SARS-CoV-2 superinfection are a known major complication of infections. These superinfections are associated with lung damage and therefore it would be good if the authors could assess lung damage, e.g. by histology, to see if their treatment has a positive impact on lung damage and thus may help to suppress complications.

      We performed histology and the results are inconclusive, but suggest that AVI-4206 treatment could lower apoptosis.There is no difference in pathology between the N40D cohort and vehicle with these markers. This could suggest that AVI-4206 provides an additional mechanism that results in protection.  We added to the results:

      Caspase 3 staining shows that AVI-4206 treatment reduces apoptosis in the lungs compared to vehicle controls. Additionally, Masson's Trichrome staining reveals  a significant reduction in collagen deposition, a surrogate for lung pathology, in the lungs of AVI-4206 treated animals.(Supplementary Figure 9).

      Histology:

      Mouse lung tissues were fixed in 4% PFA (Sigma Aldrich, Cat #47608) for 24 hours, washed three times with PBS and stored in 70% ethanol. All the stainings were performed at Histo-Tec Laboratory (Hayward, CA). Samples were processed, embedded in paraffin, and sectioned at 4μm. The slides were dewaxed using xylene and alcohol-based dewaxing solutions. Epitope retrieval was performed by heat-induced epitope retrieval (HIER) of the formalin-fixed, paraffin-embedded tissue using citrate-based pH 6 solution (Leica Microsystems, AR9961) for 20 mins at 95°C. The tissues were stained for H&E, caspase-3 (Biocare #CP229c 1:100), and trichrome, dried, coverslipped (TissueTek-Prisma Coverslipper), and visualized using Axioscan 7 slide scanner (ZEISS) at 40X. Image quantification was performed with Image J software and GraphPad Prism.

      (c) Fig. 1D labelling is wrong

      Thank you - fortunately the data were plotted correctly and it was just the inset table of values that was incorrect. This is now fixed!

      (d) Line 88: "T" missing at start of sentence

      Fixed, thank you!

      (e) Line 118: NudT5/AMP-Glo assay was developed in https://doi.org/10.1021/acs.orglett.8b01742

      We have added this foundational reference, thank you!

      (f) Line 147ff: It would be good if the authors could highlight that the TPP methodology has known limitations (e.g. detection of low abundance proteins and low thermal shift of some binders) and thus is not an absolute proof that AVI-4206 "engage with high specificity for Mac1"

      We added this important context to the concluding sentence of this paragraph:

      “While this assay may not be sensitive to detection of proteins with low abundance proteins or low thermal shift upon ligand binding, collectively, these results indicate that AVI-4206 can cross cellular membranes and engage with high specificity for Mac1.”

      (g) The authors use their well established in vitro Mac1 model as well as the SARS-CoV-2 WA strain. Given the ongoing diversification of SARS-CoV-2 and the current prevalence of the Omicron VOC it would be good if the authors could investigate whether alteration in Mac1 occurred or are detected which could influence the efficacy of their inhibitor. Similarly, it would be interesting to know how effective their drug is on other clinically relevant beta-CoV Mac1, e.g. from MERS or SARS1.

      We thank the reviewer for the suggestion. Mac1 is one of the more conserved areas of the SARS-CoV-2 genome as there has only been one nonsynonymous mutation V34L (Orf1a:V1056L) that recently emerged in the BA.2.86 lineage and is now in all of the JN.1 derivatives. Currently, the mutation is only ~80% penetrant in circulating SARS-CoV-2 sequences suggesting that it might revert to wild-type and is not associated with a fitness benefit. Based on our structural analysis (shown in Supplementary Figure4D above), we do not believe this mutation affects AVI-4206 binding, but we are including this variant in our future in vitro and in vivo studies as well as other beta-CoV.  For SARS and MERS, see response to Reviewer 1 using CETSA to show that these targets are engaged by AVI-4206.

      (h) As methods to detect PARP14-derived ADP-ribosylation are available and it was shown that Mac1 can reverse this modification in cells. It would be good if the authors could investigate the impact of AVI-4206 on ADP-ribosylation in vivo.

      To test this idea we adapted the IF assay used by others in the field and show an effect of AVI-4206. We have added to the text:

      Although the IFN response was not sufficient to control viral replication, it is possible that the changes in ADP-ribosylation, in particular marks catalyzed by PARP14, downstream of IFN treatment could serve as a marker for Mac1 efficacy  (Ribeiro et al. 2025). To investigate whether downstream signals from PARP14 were specifically erased by Mac1, we used an immunofluorescence assay that showed that Mac1 could remove IFN-γ-induced ADP-ribosylation that is mediated by PARP14 (Kar et al. 2024).  We stably expressed wild-type Mac1 and the N40D mutant Mac1 in A549 cells. The data showed that Mac1 expression decreased IFN-γ-induced ADP-ribosylation, whereas the Mac1-N40D mutant did not (Figure 3E, F), indicating that Mac1 mediates the hydrolysis of IFN-γ-induced ADP-ribosylation. The PARP14 inhibitor RBN012759 completely blocked IFN-γ-induced ADP-ribosylation (Figure 3E, F), further confirming that IFN-γ-induced ADP-ribosylation is mediated by PARP14. AVI-4206 reversed the Mac1-induced hydrolysis of ADP-ribosylation and enhanced the ADP-ribosylation signal in Mac1-overexpressing cells (Figure 3E, F), further demonstrating its ability to inhibit the hydrolase activity of Mac1. We further validated this result using different ADP-ribosylation antibodies for immunofluorescence (Supplementary Figure 7). However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8). Collectively, these results provide further evidence that simple cellular models are insufficient to explore the effects of Mac1 inhibition and that monitoring specific PARP14-mediated ADP-ribosylation patterns can provide an accessible biomarker for the efficacy of Mac1 inhibition.

      A549 Mac1 expression cell construction

      Mac1 wild-type (Mac1) and N1062D mutant (Mac1 N1062D) gene fragments were loaded into pLVX-EF1α-IRES-Puro (empty vector, EV) using Gibson cloning kit (NEB E5510). Lentivirus was prepared as previously described (PMID: 30449619; DOI: 10.1016/j.cell.2018.10.024). Briefly, 15 million HEK293T cells were grown overnight on 15 cm poly-L-Lysine coated dishes and then transfected with 6 ug pMD2.G (Addgene plasmid # 12259 ; http://n2t.net/addgene:12259 ; RRID:Addgene_12259), 18 ug dR8.91 (since replaced by second generation compatible pCMV-dR8.2, Addgene plasmid #8455) and 24 ug pLVX-EF1α-IRES-Puro (EV, Mac1, Mac1-N1062D) plasmids using the lipofectamine 3000 transfection reagent per the manufacturer’s protocol (Thermo Fisher Scientific, Cat #L3000001). pMD2.G and dR8.91 were a gift from Didier Trono. The following day, media was refreshed with the addition of viral boost reagent at 500x as per the manufacturer’s protocol (Alstem, Cat #VB100). Viral supernatant was collected 48 hours post transfection and spun down at 300 g for 10 minutes, to remove cell debris. To concentrate the lentiviral particles, Alstem precipitation solution (Alstem, Cat #VC100) was added, mixed, and refrigerated at 4°C overnight. The virus was then concentrated by centrifugation at 1500 g for 30 minutes, at 4°C. Finally, each lentiviral pellet was resuspended at 100x of original volume in cold DMEM+10%FBS+1% penicillin-streptomycin and stored until use at -80°C. To generate Mac1 overexpressing cells, 2 million A549 cells were seeded in 10 cm dishes and transduced with lentivirus in the presence of 8 μg/mL polybrene (Sigma, TR-1003-G). The media was changed after 24h and, after 48 hours, media containing 2μg/ml puromycin was added. Cells were selected for 72 hours and then expanded without selection. The expression of Mac1 was confirmed by Western Blot.

      Immunofluorescence assay:

      To assess the effect of Mac1 on IFN-induced ADP-ribosylation. A549-pLVX-EV, A549-pLVX-Mac1 and A549-pLVX-Mac1-N1062D cells were seeded in 96-well plate (10,000 cells/well). Cells were pre-treated with medium or 100 unit/mL IFN-γ (Sigma, SRP3058) for 24 hours to induce the expression of ADP-ribosylation. These 3 cell lines were then treated the next day with the indicated concentrations of AVI-4206 or RBN012759 (Medchemexpress, HY-136979). After 24 hours of exposure to drugs, treated cells were fixed in pre-cooled methanol at -20°C for 20 min, blocked in 3% bovine serum albumin for 15 min, incubated with Poly/Mono-ADP Ribose (E6F6A) Rabbit mAb (CST, 83732S) or Poly/Mono-ADP Ribose (D9P7Z) Rabbit mAb (CST, 89190S) antibodies for 1 h, and then incubated with Goat anti-Rabbit IgG Secondary Antibody, Alexa Fluor 488 (ThermoFisher, A-11008) secondary antibodies for 30 min and stained with DAPI for 10 minutes. Fluorescent cells were imaged with an IN Cell Analyzer 6500 System (Cytiva) and analyzed using IN Carta software (Cytiva).

      Reviewer #3 (Recommendations for the authors):

      Just a couple of observations/details that might help strengthen the article:

      (1) The caco-1 data for AVI4206 would suggest that there is some sort of efflux going on, yet there is no mention of it in the paper. This might be useful in the optimization paradigm moving forward.

      We thank the reviewer for this observation and suggestion.  Indeed, we believe that efflux is behind the low oral bioavailability of AVI-4206.  We are working specifically to remove this liability in next-generation analogs, using the caco2 assay to guide this ongoing effort. Keep an eye out for a preprint on this soon!  We have added to the discussion:

      “In addition to dissecting such molecular mechanisms of macrodomain function and inhibition, future efforts will focus on improving pharmacokinetic properties, including a cellular efflux liability that results in low oral bioavailability of AVI-4206. ”

      (2) There are some spectroscopic anomalies/mistakes in the NMR data. The carbon NMR for 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one should only have 14 unique carbons, but the authors report 15. The HNMR for AVI1500 should only have 19 H's, but the authors list 20. The HNMR data for AVI3762/3763 should have 16 H's, but the authors only report 13. The CNMR for AVI4206 should only have 19 unique carbons, but the authors report 20.

      Thank you for noting these inconsistencies regarding the reported NMR spectra. We have rectified them by more closely examining the spectra and in some cases acquiring new data. We identified one peak (47.9) in the 13C NMR of 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one that is apparently an artifact of the automated peak picking in the data analysis software.  In the 1H NMR of AVI-1500, the triplet peak at 7.20 integrates to 1H, but was erroneously reported as 2H in the original manuscript.  This error has been corrected.  Spectra were re-acquired for AVI-3762, AVI-3763, and AVI-4206 with longer acquisition times, and/or on a 600 MHz spectrometer to afford the complete line lists now reported in the revised manuscript. Please note AVI-4206 has 18 distinct 13C resonances due to the equivalence of the gem-dimethyl methyl groups.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript, by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans, limiting the novelty of the reported findings. The presented results should be considered in the context of these prior important findings. The focus on self-citation of their previous work, using the same approach to measure dual TCR expression in other datasets. limits the discussion of other more relevant and impactful published research in this area. Also, Reference #7 continues to list incorrect authors. The authors do not present a balanced or representative description of the available knowledge about either dual TCR expression by T cells or TCR repertoires of Tregs.

      The approach used follows a template used previously by this group for re-analysis of existing datasets generated by other research groups. The descriptions and interpretations of the data as presented are still shallow, lacking innovative or thoughtful approaches that would potentially be innovation or provide new insight.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells. The response to this criticism in a previous review is considered non-responsive and does not improve the data or findings.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. The interpretations of the gene expression analyses are somewhat simplistic, focusing on single-gene expression of some genes known to have function in Tregs. However, the investigators continue to miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291). No attempt to define clusters is made. No comparison is made of the proportions of dual TCR cells in transcriptionally-defined clusters. The broad assessment of key genes by single- and dual TCR cells is conceptually interesting, but likely to be confounded by the heterogeneity of the Treg populations. This would need to be addressed and considered to make any analyses meaningful.

      The study design, re-analysis of existing datasets generated by other scientific groups, precludes confirmation of any findings by orthogonal analyses.

    3. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Comments on revisions:

      Although the authors have provided brief explanations in response to the reviewers' comments, they do not present any additional analyses that would address the fundamental concerns in a convincing manner.

      Moreover, the in silico analyses presented in the manuscript alone are insufficient to support the conclusions, and the functional experiments requested by the reviewers have not been conducted.

      In the current rebuttal, while some textual additions have been made to the manuscript, the only substantial revision to the figures appears to be the inclusion of statistical significance annotations (e.g., Fig. 1G, Fig. 3G). These changes do not adequately strengthen the overall data or address the core issues raised.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. eLife Assessment

      This study proposes an important new approach to analyzing cell-count data, which are often undersampled and cannot be accurately assessed using traditional statistical methods. The case studies presented in the article provide compelling evidence of the superiority of the proposed methodology over existing approaches, which could promote the use of Bayesian statistics among neuroscientists. The authors have taken steps to make the methodology accessible, although some implementation difficulties are likely to remain.

    2. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.<br /> The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      As with any Bayesian model, the choice of prior can significantly influence the results. The authors explain how the methodology can be adapted to different data properties, though selecting an appropriate prior or likelihood may not always be straightforward. They propose a 'standard workflow' as an alternative to traditional approaches, which could and should be used alongside established methods while Bayesian techniques continue to evolve and improve.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. eLife Assessment

      Using single-cell transcriptomic data from adult mouse inner ear hair cells, the authors identify the differences and similarities of the four hair cell types. They make an important finding: that vestibular hair cells can express many ciliary motility-related genes. Some hair cell kinocilia display motility, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity. The evidence is incomplete as to whether all kinocilia beat and what the function of kinocilia movement is.

    2. Reviewer #1 (Public review):

      Summary

      Xu et al. use transcriptomic comparisons of mouse cochlear and vestibular hair to show that the vestibular hair cells alone are enriched in gene expression for proteins necessary for cilia motility and to further argue that such motility is a normal function of the kinocilia.

      Background:

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons, and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it, the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. noted that some ciliary genes related to motility are expressed by vestibular but not cochlear hair cells. They curated the ciliary genes into types known to be associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. They add immunostaining to back up some of the RNA data, and also evaluate relative expression by neonatal mouse cochlear and vestibular hair cells from a published dataset. The focus on kinociliary genes is an appropriate use of the comparative expression data for cochlear and vestibular hair cells, and the paper overall is readable and interesting. The transcriptome data are rounded off by comparing the authors' results in adult hair cells with published neonatal mouse cochlear and vestibular transcriptomes.

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

    4. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. eLife Assessment

      This useful study provides new insights into the liver stage antigen LSA3, its export to erythrocytes, and its role in liver stage development. While the functional importance of LSA3 is well-demonstrated, the data underlying conclusions about antibody specificity, liver stage localization, and phenotype remain incomplete. A key gain is the use of mosquito and humanized mouse models to access life cycle stages rarely studied in most laboratories.

    2. Reviewer #1 (Public review):

      Summary:

      The extent to which P. falciparum liver stage parasites export proteins into the host cell is unclear. Most blood-stage exported proteins tested in liver stages were not exported. An exception is LISP2, which is exported in P. berghei but not P. falciparum liver stages. While the machinery for export is present in liver stages, efforts to demonstrate export have so far been mostly unsuccessful. Parasite proteins exported during the liver stage could be presented by MHC and thereby become the target of immune control, an incentive to study liver stage export and identify proteins exported during this stage. However, particularly for P. falciparum, it is very difficult to study liver stages.

      This work studies LSA3 in P. falciparum blood and liver stages. The authors show that this protein is exported into the host cell in blood stages, but in liver stages, no or only very little export was detected. A disruption of LSA3 reduced liver stage load in a humanized mouse model, indicating this protein contributes to efficient development of the parasites in the liver.

      The paper also studies the localization of LSA3 in blood stages and uses a known inhibitor to show that it is processed by plasmepsin 5, a protease important for protein trafficking. The work also shows that LSA3 is not needed for passage through the mosquito.

      Strengths:

      The main strength of this work is the use of the humanized mouse model to study liver stages of P. falciparum, which is technically challenging and requires specialized facilities. The biochemical analysis of LSA3 localization and processing by plasmepsin 5 is thorough and mostly overcame adverse issues such as a cross-reactive antibody and the negative influence of the GFP-tag on LSA3 trafficking. The mosquito stage analysis is also notable, as these kinds of studies are difficult with P. falciparum. However, there was no evidence for a function of LSA3 in mosquito stages.

      Weaknesses:

      The cross-reactivity of the antibody, together with the co-infection strategy, prevents reliable assessment of LSA3 localization in liver stages. Despite this, it seems LSA3 is not exported in liver stages, and the paper does not bring us closer to the original goal of finding an exported liver stage protein.

      While the localization analysis in blood stages is well done and thorough, the advance is somewhat limited. LSA3 may be in structures like J dots, but this hypothesis was not tested. Although parasites with a disrupted LSA3 were generated, the function of this protein was not explored. Given that a previous publication found some inhibitory effect of LSA3 antibodies on blood stage growth, a comparison of the growth of the LSA3 disruption clones with the parent would have been very welcome and easy to do. At this point, LSA3 is one more of many proteins exported in blood stages for which the function remains unclear.

      It might be possible to refine some of the conclusions. The impact on liver stage development is interesting, but which phase of the liver stage is affected, and the phenotype remains largely unknown. The co-infection (WT together with LSA3 mutant) has the advantage of a direct comparison of the mutant with the control in the same liver, but complicates phenotypic analysis if the LSA3 antibody is also cross-reactive in liver stages. This issue adds a question mark to the shown localization and precludes phenotypic comparisons. The authors write that they do not know if the cross-reactive protein is expressed at that stage. But this should be immediately evident from the mixed WT/mutant infection. If all cells are positive for LSA3, there is a cross-reaction. If about half of the cells are negative, there isn't. In the latter case, the localization shown in the paper is indeed LSA3, and morphological differences between WT and LSA3 disruption could be assessed without additional experiments.

      Significance:

      The conclusion from the paper that "our study presents just the second PEXEL protein so far identified as important for normal P. falciparum liver-stage development and confirms the hypothesized potential of exported proteins as malaria vaccine candidates" is partially misleading. Neither LISP2 nor LSA3 seems to be exported in P. falciparum liver stages, and we can't confirm the potential of vaccines with proteins exported in this stage. LSA3 is still important and may still be the target of the immune response, but based on this work, probably not due to export in liver stages.

    3. Reviewer #2 (Public review):

      Summary:

      Immunogenic Plasmodium falciparum proteins that could be targeted to prevent parasite development in the liver are of significant interest for novel anti-malarial vaccine development. In this study, McConville et al evaluate the trafficking and functional importance of LSA3, a protein expressed in the blood and liver stages and previously shown to provide protection in immunized chimpanzees. LSA3 contains a PEXEL motif, but the authors have previously shown that this protein does not appear to be exported beyond the PVM in the liver stage (McConville et al, PNAS 2024). However, LSA3 trafficking and functional importance have not been comprehensively evaluated across stages. In the present study, the authors find that blood-stage LSA3 undergoes PEXEL processing, and a portion of the protein is exported into the erythrocyte, where it localizes to punctate structures distinct from Maurer's clefts. Using a knockout mutant, LSA3 is shown to be dispensable for blood and mosquito stages but important to liver-stage development. Collectively, these results validate LSA3 as a liver-stage target and place it among several other PEXEL proteins that display differential trafficking beyond the PVM in the erythrocyte but not the hepatocyte.

      Strengths:

      (1) The authors present a thorough analysis of LSA3 trafficking in the blood stage. PEXEL processing by Plasmepsin 5 is clearly demonstrated through a combination of mini LSA3-GFP reporters and Plasmepsin 5 inhibitors. Importantly, an LSA3 knockout mutant is used to show that the LSA3-C anti-sera also react with additional, unidentified parasite proteins in the blood stage. Nonetheless, comparison between the WT and KO parasites clearly indicates that a portion of LSA3 is exported into the erythrocyte, which is further supported by protease-protection assays with fractionated iRBCs. This contrasts with the liver stage, where LSA3 does not appear to traffic beyond the PVM, similar to what has been observed for other PEXEL proteins in the rodent malaria model.

      (2)This study provides the first direct analysis of LSA3 function by reverse genetics, showing this protein is important for liver stage development in chimeric human liver mice. Several PEXEL proteins in P. berghei have been shown to be exported into the host cell in the blood stage, but do not appear to cross the PVM in the liver stage. These observations reinforce that even without detectable export into the hepatocyte, PEXEL proteins play critical roles during liver stage development.

      Weaknesses:

      (1) A previous study reported that anti-LSA3 antibodies inhibit blood-stage growth, suggesting a role for LSA3 during erythrocyte infection. While the authors carefully evaluate the LSA3 mutant in mosquito and liver stages, the impact on blood stage fitness is not tested. While the knockout shows LSA3 is not essential in the blood stage, its importance during erythrocyte infection remains unclear.

      (2) The authors previously reported that anti-LSA3-C signal in the liver stage localizes within the parasite and at the parasite periphery but is not exported into the hepatocyte. In the present study, it is shown that anti-LSA3-C reacts with other parasite proteins beyond LSA3 in the blood stage, and this may also occur in the liver stage. However, since liver-stage IFAs were only performed on samples co-infected with both WT and ∆LSA3 parasites, non-specific anti-LSA3-C reactivity at this stage could not be determined, and the localization of LSA3 in the liver stage remains somewhat unclear.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript provides a comprehensive characterization of the Plasmodium falciparum protein LSA3, combining biochemical, genetic, and in vivo approaches. The authors convincingly demonstrate that LSA3 is expressed during liver stage infection and that disruption of the gene leads to a modest but reproducible reduction in liver stage parasite load in humanized mice.

      Strengths:

      Their biochemical and cell biological analysis of blood stages provides strong evidence that LSA3 is exported to the infected erythrocyte, and the detailed analysis of its PEXEL motif processing is well executed.

      Weaknesses:

      The study suggests LSA3 as one of only two known P. falciparum PEXEL proteins contributing to this stage, although there is no evidence for the export beyond the vacuolar membrane. Several key conclusions, particularly regarding antibody specificity, localization in liver stage parasites, and the interpretation of the phenotypic data, are not fully supported by the current experiments.

    1. eLife Assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organisation in the actin cortex. The theoretical work is solid and provides a rigorous theoretical framework to study active self-organisation in actomyosin systems, including useful qualitative comparison with experimental observations.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative. It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination. The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns. Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim. Additionally, it's unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase. Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

    3. Reviewer #2 (Public review):

      Summary:

      The article by Waleed et al discusses the self-organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self-organized structures can emerge.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      Comments on revised version:

      The authors have satisfactorily responded to the comments

    4. Reviewer #3 (Public review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripe-like patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the most crucial assumptions underlying continuum simulations.

      The paper is well written, figures are mostly clear, and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not explicitly stated this way by the authors, I would argue that combining these two is one of the key ingredients that distinguishes this theoretical paper from similar ones.

      The diversity of patterning processes experimentally observed and theoretically described is nicely elaborated on in the introduction of the paper. The theory development and discussion of the continuum model itself is also well-embedded in a review of the relevant broad literature on active liquid crystals and active nematics, which includes plenty of previous results by the authors themselves. Interestingly, several of the patterns identified in the present work, such as 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019) have been observed previously in different, but related, active isotropic fluid models. In light of this crowded literature, the authors do good job in delineating key results obtained in the present manuscript from existing work.

      The results of numerical simulations are well-presented. The discussion of numerical observations is comprehensive, but also at many times qualitative. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system, which is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (Nejad et al, Nat Comm 2024). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      The authors must be complimented for trying to gain further mechanistic insights into their conclusions using microscopic filament simulations that were diligently performed. It is rightfully stated that these simulations only provide plausibility tests about key assumptions underlying the hydrodynamic theory. Within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 microscopically, in which the continuum theory does also predict the formation of stripe patterns? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa? The authors clearly explain the scope and limitations of the microscopic model, which suggests that questions like these will be interesting directions of future investigations.

      Overall, the paper represents a valuable contribution to the field of active matter that should provide a fruitful basis to develop new hypothesis about the dynamic self-organisation and mechanics of dense filamentous bundles in biological systems.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we have expanded the motivation and description of the theoretical model, specifically insisting on the experimental evidence supporting its rationale and assumptions. These changes in the revised manuscript are implemented in the two first paragraphs of Section “Theoretical model” and in a more detailed description and justification of the different mathematical terms that appear in that section. We have made an effort to map in our narrative different terms to mechanistic processes in the actomyosin network. Even if the nature of the manuscript is inevitably theoretical, we think that the revised manuscript will be more accessible to a broader spectrum of readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      (A) This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We understand the point of the referee. While it is unavoidable to present the continuum hydrodynamic theory behind our results, we have made an effort in the revised manuscript to (1) motivate the essential features required from a theoretical model of the actomyosin cytoskeleton capable of describing its nematic self organization (two first paragraphs of Section “Theoretical model”), and to (2) explicitly explain the physical meaning of each of the mathematical terms in the theory, and when appropriate, relate them to molecular mechanisms in the cytoskeleton. We hope that the revised manuscript addresses the concern of the referee.

      Regarding the comparison with experiments, they are indeed qualitative because the main point of the paper is to establish a physical basis for the self-organization of dense nematic structures in actomyosin gels. Somewhat surprisingly, we argue that a compelling mechanism explaining the tendency of actomyosin gels to form patterns of dense nematic bundles has been lacking. As we review in the introduction, these patterns are qualitatively diverse across cell types and organisms in terms of geometry and dynamics, and for this reason, our goal is to show that the same material in different parameter regimes can exhibit such qualitative diversity. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured and are expected to vary wildly between cell types. In fact, estimates in the literature often rely on comparison with hydrodynamic models such as ours. For this reason, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. Second, the patterns of nematic bundles found across cell types depend on the interaction between (1) the intrinsic tendency of actomyosin gels to form such structures studied here and (2) other elements of the cellular context. For instance, polymerization and retrograde flow from the lamellipodium, the physical barrier of the nucleus, and the interaction with the focal adhesion machinery are essential to understand the emergence of stress fibers in adherent cells. Cell shape and curvature anisotropy control the orientation of actin bundles in parallel patterns in the wings and trachea of insects. Nuclear positions guide the actin bundles organizing the cellularization of Sphaeroforma arctica [11]. Here, we focus on establishing that actomyosin gels have an intrinsic ability to self organize into dense nematic bundles, and leave how this property enables the morphogenesis of specific structures for future work. We have emphasized this point in the revised section of conclusions.

      (B) It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin gels originating from living cells. To our knowledge, the ability of reconstituted actomyosin gels from purified proteins to sustain the kind of contractile dynamical steady-states observed in living cells is very limited. In the revised manuscript, we cite a very recent preprint presenting very exciting but partial results in this direction [49]. Instead, reconstituted in vitro systems encapsulating actomyosin cell extracts robustly recapitulate contractile steady-states. This point has been clarified in the first paragraph of Section “Theoretical model”.

      (C) The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree with the referee and in the revised manuscript we have avoided the term “sarcomeric” because it refers to very specific organizations in cells. What we previously called “sarcomeric patterns”, where bands of high density exhibit nematic order perpendicular to the axis of the bands, is not a structure observed to our knowledge in cells. It is introduced to delimit the relevant region in parameter space. In the revised manuscript, we refer to this pattern as “banded pattern with perpendicular nematic organization” or “banded pattern” in short.

      (D) Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We thank the referee for raising this point, which was not sufficiently clarified in the original manuscript. We first note that in incompressible active nematic models, active tension is deviatoric (traceless and anisotropic) because an isotropic component would simply get absorbed by the pressure field enforcing incompressibility. Being compressible, our model admits an active tension tensor with deviatoric and isotropic components. We consider always a contractile (positive) isotropic component of active tension, but the deviatoric component can be either contractile (𝜅 > 0) or extensile (𝜅 < 0), where we follow the common terminology according to which in contractile/extensile active nematics the active stress is proportional to q with a positive/negative proportionality constant [see e.g. https://doi.org/10.1038/s41467018-05666-8]. Furthermore, as clarified in the revised manuscript, total active stresses accounting for the deviatoric and isotropic components are always contractile (positive) in all directions, as enforced by the condition |𝜅| < 1.

      For fibrillar patterns, we need 𝜅 < 0, and therefore active stresses are larger perpendicular to the nematic direction. This means that the anisotropic component of the active tension is extensile, although, accounting for the isotropic component, total active tension is contractile (see Fig. 1c). This is now clarified in the text following Eq. 7 and in Fig. 1.

      However, following fibrillar pattern formation and as a result of the interplay between active and viscous stresses, the total stress can be larger along the emergent dense nematic structures (“contractile structures”) or perpendicular to them (“extensile structures”). To clarify this point, in the revised Fig. 4 and the text referring to it, we have expanded our explanation and plotted the difference between the total stress component parallel to the nematic direction (𝜎∥) and the component perpendicular to the nematic direction (𝜎⊥), with contractile structures satisfying 𝜎∥ − 𝜎⊥ > 0 and extensile structures satisfying 𝜎∥ − 𝜎⊥ < 0. See lines 280 to 303. This is consistent with the common notion of contractile/extensile systems in incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8].

      (E) Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, as discussed in our response to comment (A) by this referee. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, possibly interacting with the adhesion machinery, finding dynamical interactions as those suggested by the referee. As an example, we show a video of a simulation where at the edge of the circular domain, there is an actin influx modeling the lamellipodium, and in four small regions friction is higher simulating focal adhesions. Under these boundary conditions, the model presented in the paper exhibits the kind of dynamical reorganizations alluded by the referee.

      Author response video 1.

      We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish.

      (F) Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. To our knowledge, these models have not been able to reproduce the heterogeneous nonequilibrium contractile states involving sustained self-reinforcing flows underlying the pattern formation mechanism studied in our work. The scope of the discrete network simulations has been clarified in lines 340 to 349 in the revised manuscript.

      While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-of-equilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate some aspects of our continuum modeling with discrete simulations. We have emphasized the complementarity of the two approaches in the conclusions.

      Reviewer #1 (Recommendations For The Authors):

      Questions on the theory:

      Does rho describe the density of actin or myosin? The authors say that they are modeling actomyosin material as a whole, but the actin and myosin should be modeled separately. Along, similar lines, does Q define the ordering of actin or myosin?

      Active gel models of the actomyosin cytoskeleton have been formulated with independent densities for actin and for myosin or using a single density field, implicitly assuming a fixed stoichiometry. Super-resolution imaging of the actomyosin cytoskeleton also suggest that in principle it makes sense to consider different nematic fields for actin and for myosin filaments. In the revised manuscript, we now explicitly mention that our density and nematic field are effective descriptions of the entire actomyosin gel (lines 82-84).

      A more detailed model would entail additional material parameters, not available experimentally, which may help reproduce specific experiments but that would make the systematic study of the different behaviors much more difficult. Our approach has been to keep the model minimal meeting the fundamental requirements outlined in the first paragraphs of Section “Theoretical model”.

      Should the active stress depend on material density? It seems strange (from Eq. 3) that active stress could be non-zero even where density is zero, since sigma_act does not depend on rho.

      Yes, active stress is assumed to be proportional to density. Eq. 3 in the original manuscript was misleading (it was multiplied by rho in Eq. 2). In the revised manuscript, we have explained with a bit more detail the theoretical model, clarifying this point.

      The authors should clearly explain their rationale for retaining certain types of nonlinear terms while ignoring others in theory. For instance, the nonlinearities in the equations of motion are sometimes quadratic in the fields, while there are also some cubic terms. Please remark up to what order in the fields the various interactions are modeled.

      We thank the referee for raising this point. The nonlinearities in the theory are easily explained on the basis of a small number of choices. We have added a new paragraph towards the end of Section “Theoretical model” (lines 145 to 152) providing a rationale for the origin and underlying assumptions leading to different nonlinearities.

      To connect with experiments and the biological context, please explain the biological origin of various terms in the model: (1) L-dependent terms in Eq. 2 and 4, (2) Flowalignment of nematic order and experimental evidence in support of it, (3) densitydependent susceptibility terms in Eq. 4

      (1) Unfortunately, the L-dependent terms are very bulky, but are very standard in nematic theories. The best way to understand their physical significance is through the expression of the nematic free-energy, which is now given and explained in the revised manuscript (Eq. 3). The resulting complicated expression for the molecular field and the nematic stress (Eqs. 4 and 5) are mathematical consequences of the choice of nematic free energy. In the revised manuscript, we also attempt to provide a basis for these terms in the context of the actin cytoskeleton. (2) To our knowledge, the best reference supporting this term from experiments is Reymann et al, eLife (2016). In the revised manuscript, we have provided a physical interpretation. (3) We have expanded the motivation and plausible microscopic justification of this term.

      There are different 'activity' terms in the model. Their biophysical origin is not made clear. For example, the authors should make clear if these activities arise from filament or motor activity. Relatedly, the authors should provide a comprehensive discussion of the signs of the different active parameters and their physical interpretations.

      In an active gel model, activity parameters are phenomenological and how they map to molecular mechanisms is not precisely known, although conventionally contractile active tension is ascribed to the mechanical transduction of chemical power by myosin motors. The fact is that, besides myosin activity, there are many nonequilibrium processes in the actomyosin cytoskeleton that may lead to active stresses including (de)polymerization of filaments or (un)binding of crosslinkers. In the revised manuscript, we have added sentences illustrating how different terms may result from microscopic mechanisms, but providing a precise mapping between our model and nonequilibrium dynamics of proteins is beyond the scope of our work, although our discrete network simulations address this issue to a certain degree.

      Following the suggestion of the referee, our description of the theory now discusses much more extensively the signs of activity parameters and their physical interpretations, e.g. the text following Eq. 7.

      Throughout the paper, various activity terms are varied independently of each other. Is that a reasonable assumption given that activities should depend on ATP and are thus not independent of one another?

      We agree that, ultimately, all active process depend on the conversion of chemical energy into mechanical energy. However, recent work has highlighted how active tension also depends on the microscopic architecture of the network controlled by multiple regulators of the actomyosin cytoskeleton (e.g. Chug et al, Nat Cell Biol, 2017). It is reasonable to expect that, for a given rate of ATP consumption, chemical power will be converted into mechanical power in different ways depending on the micro-architecture of the cytoskeleton, e.g. the stoichiometry of filaments, crosslinkers, myosins, or the length distribution of filaments (very long filaments crosslinked by myosins may be difficult to reorient but may contract efficiently).

      We have added a paragraph in Section “Theoretical model” with a discussion, lines 153 to 156.

      Sarcomeres are muscle fibers that exhibit alternating polarity pattern. Such patterning is not evident in what the authors call 'sarcomeres' in Fig. 2. I believe the authors should revise their terminology and not loosely interpret existing classifications in the field.

      We thank the referee for raising this point. We have changed the terminology.

      Fig 2a: Is the cartoon for filament alignment incorrect for kappa>0?

      The cartoon is correct. In the revised manuscript we have explained more clearly the physical meaning of kappa in the text following Eq. 7. In the caption of Fig. 1 and of Fig. 2a, we have also clarified that when the absolute value of kappa is <1, then active tension is positive in all directions.

      Within the section "Requirements for fibrillar and banded patterns", it will be useful to show the figures for varying the different active parameters in the main figures.

      We have followed the referee’s suggestion and moved Supp. Fig. 1 of the original manuscript to the main figures.

      How do the authors decide if bundles are contractile or extensile? Why are contractile bundles under tension while extensile bundles are under compression? I would expect the opposite.

      We agree that this point deserves a more detailed explanation. In the revised manuscript and in the new Figure 4, we further develop this point. The fibrillar pattern forms when kappa<0. We further assume that -1<kappa<0, so that active tension is positive in all directions. In this regime, the deviatoric (anisotropic) part of active tension is extensile. However, following pattern formation and because of the interplay between active and viscous stresses, the total stress in the emerging bundles may become extensile or contractile, depending on whether the largest component of stress is perpendicular or along the bundle axis. This is now presented in the updated figure, with new panels presenting maps of the total tension. The text discussing this point has been rewritten and we hope that the new version is much clearer (lines 280 to 303).

      A contractile bundle tends to shorten, but it cannot do it because of boundary conditions or the interaction with other bundles. As a result they are in tension. Conversely, an extensile bundle tries to elongate, but being constrained, it becomes compressed. As an analogy, consider the cortex of a suspended cell. The cortex is contractile, but it cannot contract because of volume regulation in th cell, which is typically pressurized. As a result, tension in the cortex is positive, as shown by Laplace’s law [10.1016/j.tcb.2020.03.005]. We have tried to clarify this point in the revised manuscript.

      Can the authors reproduce alternating density patterns using the cytosim simulations? This is an important step in establishing the correspondence between the continuum theory and the agent-based model.

      We have addressed this point in our response to public comment (F) of this referee.

      The authors do not provide code or data.

      The finite element code with an input file require to run a representative simulation in the paper is now made available, see Ref. [74].

      The customizations of Cytosim needed to account for nematic order in our discrete network simulations are available, see Ref. [98].

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we have highlighted the novelty, particularly in the last paragraph of the introduction, the first two paragraphs of Section “Theoretical model”, and in the conclusions. Despite a very large literature on theoretical models of stress fibers, actin rings, and active nematics, we argue that the active self-organization of dense nematic structures from an isotropic and low-density gel has not been compellingly explained so far. Many models assume from the outset the presence of actin bundles, or explain their formation using localized activity gradients. The literature of active nematics has extensively studied symmetry breaking and the self-organization. However, most of the works assume initial orientational order. Only a few works study the emergence of nematic order from a uniform isotropic state, but consider dry systems lacking hydrodynamic interactions or incompressible and density-independent systems [37,38]. Yet, pattern formation in actomyosin gels is characterized by large density variations, and by highly compressible flows, which coordinate in a mechanism relying on an advective instability and self-reinforcing flows.

      Our theoretical model is not particularly novel, and as we mention in the manuscript, it can be particularized to different models used in the literature. However, we argue that it has the right minimal features to capture nematic self-organization in actomyosin gels. To our knowledge, no previous study explains the emergence of dense and nematic structures from a low-density isotropic gel as a result of activity and involving the advective instability typical of symmetry-breaking and patterning in the actomyosin cytoskeleton. These are important qualitative features of our results that resonate with a large experimental record, and as such, we believe that our work provides a new and compelling mechanism relying on self-organization to explain the prominence and diversity of patterns involving dense nematic bundles in the actomyosin cytoskeleton across species.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that this was a weakness of the original manuscript. In the revised manuscript, within reasonable space constraints given the size and dynamism of the field of active nematics, we have placed our work in the context of this field (end of introduction and first two paragraphs of Section “Theoretical model”). The published version of our companion manuscript [45] also contributes to providing a clear context to our theoretical model within the field.

      Reviewer #2 (Recommendations For The Authors):

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article. I explain my questions comments below.

      We have responded to this comment above.

      (i) Active nematics including density variations have been dealt quite extensively in the literature. For example, the works of Sriram Ramaswami have dealt with this system including linear stability analysis, simulations etc. In what way is the present work different from the system that they have considered?

      (ii) Active flows leading to self organization has been a topic of discussion in many works. For example: (i) Annual Review of Fluid Mechanics, Vol. 43:637-659, 2010, https://doi.org/10.1146/annurev-fluid-121108-145434 (ii) S Santhosh, MR Nejad, A Doostmohammadi, JM Yeomans, SP Thampi, Journal of Statistical Physics 180, 699-709 (iii) M. G. Giordano1, F. Bonelli2, L. N. Carenza1,3, G. Gonnella1 and G. Negro1, Europhysics Letters, Volume 133, Number 5. In what way this work is different from any of these?

      (iii) I am confused about the models used in the paper. There is significant literature from Prof. Mike Cates group, Prof. Julia Yeomans group, Prof. Marchetti's group who all use similar governing equations. In the present paper, I find it hard to understand whether the model used is similar to the existing ones in literature or are there significant differences. It should be clarified.

      Response to (i), (ii) and (iii).

      We completely agree with this referee (and also the previous referee), that the contextualization of our work in the field of active nematics was very insufficient. In the revised manuscript, the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model” now address this point. In short, previous active nematic models predicting patterns with density variations have been either for dry active matter (disregarding hydrodynamic interactions), or for suspensions of active particles moving in an incompressible flow. None of these previous works predict nematic pattern formation as a result of activity relying on the advective instability and self-reinforcing compressible flows, leading to high density and high order bundles surrounded by an isotropic low density phase. Yet, these are fundamental features observed in actomyosin gels. Many works deal with symmetry-breaking of a system with pre-existing order, but very few address how order emerges actively from an isotropic state. We thank the referee for pointing at the paper by Santhosh et al, who nicely make this argument and is now cited. Our mechanism is fundamentally different from that in Santhosh, whose model is incompressible and ignores density variations.

      We hope that the revised manuscript addresses this important concern.

      (i) >(iv) Below Eqn 6, it starts by saying that the “...origin..is clear...” Its not. I don't understand the physical origin of the instability, and this should be clarified, may be with some illustrations.

      We apologize for this unfortunate sentence, which we have rewritten in the revised manuscript (lines 181 to 185).

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel, as mentioned in the revised manuscript. We have emphasized this point in different places in the revised manuscript. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. The companion publication has now been accepted in the New Journal of Physics, with significant changes to better connect the work to the field of active nematics. A preprint reflecting those changes is available in Ref. [64], but we hope to reference the published paper that will come out soon.

      In the revised manuscript, we have significantly rewritten the Section “Theoretical model” to frame the continuum model in the context of the field of active nematics. While our model and results have commonalities with previous work, there are also important differences. We have highlighted the novelty of the present work along with the relation with previous studies and theoretical models in the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model”. Furthermore, as suggested by the referee, we have made an effort to connect our results with previous work by Kumar, Mietke, Doostmohammadi and others.

      Regarding the last point alluded by the referee (“extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis”), the picture raised by the referee would be nuanced for our compressible system as compared to the incompressible systems discussed in that reference. As we have elaborated in our response to point (D) of Referee #1, our systems are overall contractile (with positive active tension in all directions), but the deviatoric component of the active tension can be either extensile or contractile. In our “extensile” models (left in Fig. 2c), material is drawn to laterally to the nematic axis but it is not expelled along this axis. Instead, it is “expelled” by turnover. In the revised manuscript, we have added a comment about this.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. In the revised manuscript, we have expanded and clarified the characterization of emergent contractile/extensile networks by reporting the relative magnitude of stress along and perpendicular to the nematic direction. Our revised manuscript clearly shows that even though all of our simulations describe locally contractile systems with extensile anisotropic active tension, the emergent meso-structures can be either extensile or contractile, with the extensile ones exhibiting the usual bend-type instability (a secondary instability in our system) described classically for extensile active nematic systems. We have rewritten the text discussing this (lines 280 to 303), where we have placed these results in the context of recent work reporting the nontrivial relation between the contractility/extensibility of the local units vs the nematic pattern.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa may not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. As discussed in our response to referee #1, we believe that studying the formation of patterns using the discrete network simulations is far beyond the scope of our work. We discuss in lines 332 to 341, as well as in the last paragraph of the conclusions, the scope and limitations of our discrete network simulations.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

      Reviewer #3 (Recommendations For The Authors):

      • The statement "the porous actin cytoskeleton is not a nematic liquid-crystal because it can adopt extended isotropic/low-order phases" is difficult to understand and should be clarified, as the next paragraph starts formulating a nematic active liquid crystal theory. Do the authors mean a crystal that "Tends to be in a disordered phase?", according to its equilibrium properties? It would still be a "nematic liquid crystal", only its ground state is not a nematic phase.

      We agree with the referee, and we hope that changes in the introduction and in Section “Theoretical model” address this comment.

      • I could not find what Frank energy is precisely used, that would be helpful information.

      In the revised manuscript, we have provided the expression for the nematic free energy in Eq. 3.

      • The Significance of green/purple arrows in Fig 2a sketch unclear, green arrows also in b,c, do they represent the same quantity? From the simulations images it is overall it is very difficult to see how the flows are oriented near the high-density regions (i.e. if they are towards / away from the strip).

      We thank the referee for bringing this up. The colorcodings of the sketches were confusing. The modified figures (Fig. 1(c) and Fig. 2(a)) present now a clearer and unified representation of anisotropic tension. The green arrows in Fig. 2(c) represent the out-of-equilibrium flows in the steady state. We agree that the zoom is insufficient to resolve the flow structure. For this reason, in the revised Fig. 2, we have added additional panels showing the flow with higher resolution.

      • It is currently unclear how the linear stability results - beyond identification of the parameter \delta - inform any of the remaining manuscript. Quantitative comparisons of the various length scales seen in simulated patterns (e.g. Fig. 2b, 3c etc) with linear predictions and known characteristic length scales would be instructive mechanistically, would make the overall presentation more compelling and probes limitations of linear results.

      In the revised manuscript, we have provided further information so that the readers can appreciate the predictions and limitations of the linear stability results. We have added a sentence and a Figure to show that, in addition to the critical activity, the linear theory provides a good prediction of the wavelengh of the pattern. See lines 199 to 201.

      • It is not clear what is meant by "[bundle-formation] requires that active tension perpendicular to nematic orientation is larger than along this direction", and therefore also not why that would be "counter-intuitive". If interpreted naively, I would say that a large tension brings in more filaments into the bundle, so that may well be an obviously helpful feature for bundle formation and maintenance. In any case, it would be helpful if clarity is improved throughout when arguments about "directions of tensions" are made.

      We have significantly rewritten the first paragraphs of section “Microscopic origin…” to clarify this point (lines 330 to 339). This paragraph, along with other changes in the manuscript such as the explanation of Eq. 7 or the discussion about the stress anisotropy in the new version of Fig. 4 (see lines 280 to 303), provide a better explanation of this important point.

      • All density color bars: Shouldn't they rather be labelled \rho/\rho_0?

      Yes! We have corrected this typo.

      • Scalar product missing in caption definition of order parameter Fig. 2

      We have corrected this typo.

      • Fig. 3a: I suggest to put the expression for q0 in the caption

      We have changed q_0 by S_0 and clarified its meaning in the caption of what now is Fig 4.

      • Paragraph on bottom right of page 6 should several times probably refer to Fig. 3c(...), instead of Fig. 3b

      We have corrected this typo.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Comments on revisions:proposes

      The revised manuscript is notably improved.

    4. eLife Assessment

      This important study provides new insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The computational methods used to establish the claims in this work are compelling and can be used as a starting point for further studies.

    1. eLife Assessment

      This important study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing a significant challenge in understanding small-molecule:IDP interactions. The findings have solid support through examples underscoring the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs at varying depths. The method builds on the authors' previous work and, with ad hoc modifications, is poised to benefit this emerging field.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow.

    4. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      Comments on revised version:

      I'm satisfied with the authors' response and the public review does not need further changes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. eLife Assessment

      This fundamental study reveals that aging in yeast leads to chromosome mis-segregation due to asymmetric partitioning of chromosomes, driven by disruption of the nuclear pore complex and pre-mRNA leakage. The findings are convincingly supported by carefully-designed experimental data with a combination of genetic, molecular biology and cell biology approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      Weaknesses:

      The authors have satisfactorily addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths:

      The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      Weaknesses:

      The authors have addressed my major concerns with experimentation or clarification.

    4. Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      Weaknesses:

      My main concerns have been thoroughly addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodelling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability. 

      Strengths: 

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging. 

      We thank the reviewer for this very positive assessment of our work

      Weaknesses: 

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. 

      We thank the reviewer for bringing this point, which we have addressed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells. To examine its prevalence, we have quantified the combined loss frequency of two chromosomes when both are labelled in the same cell. Whereas single chromosomes are lost at a frequency of 10-15% per cell, less than 5% of the cells lose both at the same time.  Thus, the different chromosomes are lost largely but not fully independently from each other. Based on these data, and on the fact that yeast cells have 16 chromosomes, we evaluate that about half of the cells lose at least one chromosome in their final cell cycle.

      We also tried to estimate the prevalence of the pre-mRNA leakage phenotype, based on the increased mCherry to GFP ratio observed between 0h and 24 hours of aging for 146 individual cells. For this analysis, we compared the mCherry/GFP ratio at 0 and 24h for the same individual cell. This analysis indicates that 81% of the cells show a fold change strictly above 1 as they age. Furthermore, the data appears to be unimodal. Thus, we can conservatively conclude that a majority of the cells show premRNA leakage at 24 hours.  Since not all cells are at the end of their life at that time, this is possibly an underestimate.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging. 

      Genomic mis-segregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We have observed these events in our movies as well.  However, the chromosome loss phenotype that we are focusing on affects only some chromosomes (as discussed above) and takes place under proper elongation of the spindle, with one SPB remaining in the mother cell whereas the other one goes to the bud, as shown in the manuscript’s Figure 2.  In our movies, chromosome loss is at least three-fold more frequent (for a single chromosome) than full genome mis-segregation (Sup Fig 1A-B). Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic mis-segregation is not (Sup Fig 1B).  Thus, genomic mis-segregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  This discussion and the relevant data have been added to the manuscript.

      We thank the reviewer for bringing up the possible confusion between these two phenotypes, allowing us to clarify this point.

      Reviewer #2 (Public review): 

      Summary: 

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem. 

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past. 

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications. 

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  Actually, we specifically show that the cells are unlikely to show splicing defect (last figure of the original and the revised version of the manuscript). Our data specifically show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses: 

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.  

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we have now considered.  In the revised manuscript, we now provide measurements of mitotic duration in the “triple intron” mutant compared to wild type cells and the duration of their last cell cycle (See supplementary figure 3A-D). There is no evidence that removing these introns slows down mitosis.  Thus, the kinetic fix hypothesis is unlikely to explain our observation about the effect of intron removal.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7-∆i gene (see supplementary Figure 3EF).  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We now document these findings in supplementary figure 3.  

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1, or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences. 

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 7B are very clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 7B). The difference is highly significant.  Therefore, we are clearly not at the statistical limit.

      What the reviewer refers to is the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cell between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells, as the reviewer rightly concludes.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 7B is not because the old cells have much more dots in general (see supplementary Figure 4C).  We hope that these clarifications help understand the data better.  We have edited the manuscript to avoid confusion.

      Reviewer #3 (Public review): 

      Summary: 

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodelling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i. 

      Strengths: 

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained. 

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:  

      In some cases, controls for experiments were not presented or were depicted in other figures. 

      We are sorry about this confusion.  We have improved our presentation of the controls, bringing them back each time they are relevant.  We have also added those that were missing (such as those mentioned by reviewer 2, see above). Note that the frequencies of centromeric plasmid loss at 0h in Figure 1C is not meaningful and therefore not presented. Since the cells were grown on selective medium before loading on to the ageing chip, we cannot report a plasmid loss frequency here. The ageing experiments themselves were subsequently conducted in full medium, to allow for centromeric plasmid loss without killing the cell. We explain this in the materials and methods section.

      High variability was seen in chromosome loss data, leading to large error bars. 

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This is now corrected as follows.  We divided the available data into 4 cohorts and then plotted the average loss frequency across these cohorts for the indicated age groups.  This filters out much of the noise and improves the statistical resolution.

      The text could have been more polished. 

      Thank you for this comment.  We have gone through the manuscript again in detail.

      Reviewer #1 (Recommendations for the authors):

      (1) A previous study (PMID: 31714209). showed that aging yeast cells undergo genomic missegregation in which material was abnormally segregated to the daughter cells, leading to cell cycle arrest. After that, the missegregation is either corrected by returning aberrantly segregated genetic material to the mother cells so that they can resume cell cycles, or if not corrected, the mother cells will terminally exist the cell cycle and eventually die. That paper also showed that this agedependent genomic missegregation is related to rDNA instability. Is the chromosome loss in this work related to the genomic missegregation reported before? Is it partially reversible like genomic missegregation? Are all the chromosomes lost in one cell division, like in the case of genomic missegregation? Some additional characterization and a discussion would be helpful. 

      As mentioned above, indeed the phenotype of full genome mis-segregation described by Crane et al. (2019) is observable in our data as well. At 24h ~3% of the cells segregate both SPBs to the bud, as they previously described (Supp Figure 1A and B).  This phenomenon is clearly distinct from asymmetric chromosome partition, where cells undergo anaphase, separate the SPBs and segregate one to the mother cell and one to the bud (Figure 2A).  Also, asymmetric chromosome partitioning affects only a subset of the chromosomes (see below), not the entire genome. Finally, unlike asymmetric chromosome partitioning, the frequency of genome mis-segregation in ageing was not alleviated by intron removal (Supp Figure 1B). Thus, these two processes are clearly distinct and driven by different mechanisms. Note that asymmetric chromosome partitioning appears 3 to 5 times more frequently than genomic mis-segregation.

      Supporting further the notion that these two processes are distinct, chromosome loss seals the end of the life of the cell, as we reported, indicating that this is not a reversible event.  Also, it does not involve all chromosomes at once. Cells that contain the labelled versions of both chromosome II and IV at the same time, the loss frequency of both chromosomes is less than 5%, whereas each chromosome is lost in 10-15% of the cells (Figure 1C). Thus, most cells lose one and keep the other. Furthermore, this indicates that there are many more cells losing at least one chromosome than the 15% that lose chromosome IV for example, probably 50% or more.  Thus, chromosome loss by asymmetric segregation is much more frequent than the partly transient transfer of the entire nucleus to the bud.

      (2) What percentage of aging WT cells undergo pre-mRNA leakage (using the GFP/mCherry reporter) during their entire lifespan? Is it a sporadic, reversible process or an accumulative, one-way deterioration? Previous studies (PMID: 32675375; PMID: 24332850; PMID: 36194205; PMID: 31291577) showed that only a fraction of yeast cells age with rDNA instability and ERC accumulation, as indicated by excessive rRNA transcription and nucleolar enlargement. Are they the same fraction of aging cells that undergo pre-mRNA leakage and chromosome loss? This information will indicate the prevalence of the key aging phenotypes reported in this work and should be readily obtainable from microfluidic experiments. In addition, a careful discussion would be helpful. 

      Pre-mRNA leakage is relatively widespread in the population, but it is difficult to put a precise number on it. Analysis of how the mCherry/GFP ratio changes in 146 individual cells between 0 and 24 hours and imaging in our microfluidics platform indicates that ~80% show an increase and 50% of the cells show an increase above 1.5-fold. Therefore, the frequencies of pre-mRNA leakage and chromosome loss are probably similar.  We have modified the discussion to account for these considerations.  This would be in the same range as the frequency of aging by ERC accumulation (mode 1) estimated by PMID: 32675375. 

      Reviewer #2 (Recommendations for the authors)

      The manuscript could use a bit of editing in places - please go through it once more. 

      Editing suggestions: 

      Line 80 – irrespective

      Corrected.

      Line 97 - these are not "rates" but frequencies. Please correct this error throughout. 

      Replaced “rate” with “frequency throughout the manuscript and the figures, when pertaining to chromosome loss

      Line 328 - increase in chromosome... 

      Corrected.

      Line 379 - tampering 

      Reviewer #3 (Recommendations for the authors):

      Specific Feedback to Authors 

      (a) Major Points 

      (i) While the proposed connection between ERC-mediated nuclear basket removal and erroneous error correction was clearly stated, this connection is correlative and was not directly tested. Specifically, although mutants impacting ERC levels were tested for missegregation, it was not directly tested if increased missegregation levels occurred due to ERC tethering to the NPC and subsequent nuclear basket removal. It is possible that the increased ERCs may be driving missegregation via a different pathway. Authors should consider experiments to strengthen this idea, such as looking at chromosome loss frequency in a sir2∆ 3x∆i double mutant, or a sir2∆ sgf73∆ double mutant. 

      This connection is addressed in the original version of the manuscript, where we show that preventing attachment of ERCs to the NPC, by removing the linker protein Sgf73, alleviates chromosome loss.  The link is further substantiated by the fact that removing the basket on its own promote chromosome loss and that in both cases, namely during normal aging, i.e., upon ERC accumulation, and upon basket removal the mechanism of chromosome loss is the same.  In both cases, it depends on the introns of the GLC7, MCM21 and NBL1 genes.  

      However, we acknowledge that the mutants tested have pleiotropic effects, making interpretation somewhat difficult, even when examining chromosome loss in multiple mutants that affect ERC formation and NPC remodelling, as we have done.  As recommended by the reviewer, we have characterized the phenotype of the sir2∆ 3x∆i mutant strain. Intron removal in the sir2∆ mutant cells largely rescued the elevated chromosome loss frequency of these cells and slightly extended their replicative lifespan (Figure 6D-E). We conclude that intron removal can remedy the chromosome loss phenotype of the sir2∆. Although clearly significant, the effect on the replicative lifespan was not very strong, likely due to the sir2∆ affecting other ageing processes.

      Touching on this question, we added a new set of experiments asking whether any accumulating DNA circle causes chromosome loss in an intron-dependent manner.  Thus, we have introduced a noncentromeric replicative plasmid in wild type and 3x∆i mutant strains carrying the labelled version of chromosome II (Figure 6A-C).  These studies show that these cells age much faster than wild type cells, as expected, and lose chromosomes at a higher frequency than non-transformed cells.  Finally, the effect is at least in part alleviated by removing the introns of NBL1, MCM21 and GLC7.

      Therefore, after adding this new and more direct test of the role of DNA circles in chromosome loss, we are confidently concluding that ERC-mediated basket removal is the trigger of chromosome loss in old cells.

      (b) Minor Points 

      (i) In Figure 1C, the text (lines 91-92) argues that chromosome loss happens abruptly as cells age; however the data only show loss at young and old time points, not an intermediate, which leaves open the possibility that chromosome loss is occurring gradually. While cells that lost chromosomes should fail to divide further, we don't know if these events happened and were simply excluded.

      We agree with the reviewer that formally the conclusion drawn in the lines 91-92 (of the original manuscript), namely that chromosome loss takes place abruptly as cells age, cannot be drawn from the Figure 1C alone but only from subsequent observations. However, since chromosome loss is lethal in haploid, as we mention in the text and the reviewer notes as well, it is difficult to envision how cells could lose chromosomes before the end of their lifespan and must therefore increase abruptly as the cells reach that point.  This is now underlined in the revised version of the manuscript. Accordingly, the frequency of chromosome loss per age group, which is depicted in Figure 3A, shows that the wild type cells that have budded less than 10 times show no chromosome loss. The chromosome loss frequency starts to ramp up only pass that point. Therefore, chromosome loss does not increase linearly with age.

      Additionally, cells that lost minichromosome should not arrest. We suggest that the interpretation of these data should be softened in the text, or that chromosome loss fraction could be more effectively portrayed as a Kaplan-Meier survival curve depicting cells that have not lost chromosomes, if these data are easily available. Or, chromosome loss at an intermediate time point could be depicted. 

      Since we cannot visualize more than 2 chromosomes at a time, it is not possible to plot the KaplanMeier curve of cells that have not lost chromosomes. However, as mentioned above, the chromosome loss frequencies at intermediate time points are depicted in Figure 3A and Figure 4B and shows that it increases with age.

      (ii) Also regarding Figure 1, it would be helpful to expound on the purpose of the minichromosomes, as well as how the Ubi-GFP minichromosome is constructed. 

      We now explained why we tested the loss of minichromosome, namely, as a mean to test whether the centromere is necessary and sufficient to drive the loss of the genetic material linked to it, i.e., chromosomes, in old cells.  Concerning the Ubi-GFP minichromosome, the Materials and methods section is now updated and reports plasmid construction, backbone used, primers as well as the plasmid sequence being available in the supplementary data.

      The purpose of the minichromosome initially appears to be the engineering of an eccDNA (ERC) with a CEN to demonstrate distinct behaviour, but it is unclear whether this was actually conducted or if the minichromosome are simply CEN plasmids and/or if this was the intended goal. Furthermore, lines 102-103 state that the presence of a centromere was necessary and sufficient for minichromosome loss. However, since no constructs lacking a centromere were tested, necessity cannot be concluded. Please clarify this in the text and include experimental details to help readers understand what was tested. 

      We apologize for having been too short here. The behaviour of the CEN-less version of this plasmid has been characterized in detail in previous studies (Shcheprova et al., 2008; Denoth-Lippuner 2014, Meinema et al 2022). Here we focused on the behaviour of the CEN+ version of an otherwise Identical plasmid.  We now clarify in the text that this plasmid is retained in the mother cell when CEN-less and cite the relevant literature. 

      (iii) It is unclear how cells at 0-3 budding events were identified in assays using the microfluidics platform. Can the authors clarify the known "age" of the cells once captured, i.e. how do the authors know how many divisions a cell has undergone prior to capture? 

      The reviewer is right; we do not know the exact age of these cells.  However, in any asynchronous population of yeast cells, which is what we start from, 50% of the cells are newborn daughters, 25% have budded once, 12.5 have budded twice, 6.25 % have budded three times…  Therefore, at the time of loading, 93% of the cells have budded between 0 and 3 times.  For this reason, we report to this population as cells age 0-3 CBE. We acknowledge that this is an approximation, but it remains a relatively safe one.  

      (iv) While the schematic in Figure 2D is generally helpful, a different depiction of the old and new SPBs would be beneficial in cases where the new SPB and TetR-GFP are depicted as colocalized, it is difficult to see that the red is fainter for the new SPB. 

      We have corrected this issue by completely separating the SPB and the Chromosome signals in the Figure 2D.

      (v) In Figure 2F, the grey colour of the 12h Ipl1-321 data bar did not have high enough contrast when the manuscript was printed-would recommend changing this to a darker shade. 

      We have corrected this issue by using a darker shade of grey.

      (vi) In Figure 3A, 'Budding' is misspelled on X-axis label  

      We have corrected this error.

      (vii) In Figure 4, the authors should clarify the differences between the analyses in panels B and C. The distinction is not immediately clear and may be difficult to grasp upon initial reading. 

      We have corrected this issue in the main text as well as figure legend.

      (viii) In Figure 5, It would aid comparisons to depict the 3x∆i only as well on panels B, D, and E. 

      We have added 3x∆i data to Figure 5,6 and 8.

      (ix) In Figure 6D, it is unclear why there was an appreciable level of unspliced RNA in the wild-type and sir2∆ young cells. Additionally, it is unclear why there is so much signal observed in the Merge image for the old wild-type cell, especially regarding the apparent bright spot. Is that nuclear signal? Please clarify. 

      The pre-mRNA processing reporter is not very efficiently spliced. It was selected as such during design (Sorenson et al 2014; DOI: 10.1261/rna.042663.113) to provide sensitivity. As for the bright spot occurring, translation of the unspliced reporter produces the N-terminal part of a ribosomal protein, a fraction of which forms some sort of nuclear aggregate in a fraction of the population. 

      (x) In Figure 6E, why does the sir2∆ exhibit higher mCherry/GFP than the wild-type and fob1∆ at "young age"? Is this due to disrupted proteostasis in the sir2∆, or a different pleiotropic effect of sir2∆? Please comment on this observation in the text.

      Indeed, as we have stated in the text the sir2∆ mutation already perturbs pre-mRNA processing in young cells. We do not know the reason of this but indeed it is most probably reflective of its pleiotropic function. Following the reviewer’s request, we now state this in the text. For example, Sir2 may regulate the acetylation state of the basket itself.  The genetic interactions observed between sir2∆ and quite a few nucleoporin mutations seem to support this possibility. 

      (xi) Throughout, the authors switch between depicting aging in Completed Budding Events versus hours, which made it difficult to compare data across figures

      Ideally, all the data in this manuscript should be plotted according to the CBE age of the cell. To ensure that the major findings are plotted in such a way, we have done so for over ~3000 combined cells and thousands of replicative divisions in Figures 3,5-7. All the measurements of chromosome loss at a specific CBE had to be done manually, due to the absence of algorithms that would be able to accurately detect chromosome loss and replicative age. Therefore, doing this for the entirety of our dataset, encompassing well over 50 ageing chips and tens of thousands of cells is not easily doable at this stage. 

      (xii) Typo on line 12 (Sindle Pole Body) 

      We have corrected this error.

      (xiii) The phrase should be 'chromosome partitioning' rather than 'chromosome partition', throughoutfor example, line 17 

      Replaced “chromosome partition” with “chromosome partitioning” throughout the text.

      (xiv) There are inconsistencies between plural and singular references throughout sentences-example, lines 35-37, and lines 44-45. 

      We carefully combed through the manuscript again and hope that we caught all inconsistencies.

    1. eLife Assessment

      This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle is important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be related to hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamical systems.

      Weaknesses:

      (1) Connecting structure and function.<br /> In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select for a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.<br /> I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck you impose on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      (4) Consideration of environmental stochasticity.<br /> The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      (5) Assumption about mutation rates<br /> If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      (6) Minor points<br /> In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.<br /> In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      Comments on revisions:

      I thank the authors for addressing many points raised by the reviewers. Overall, the readability of the manuscript has improved with more context provided around why they were solving this specific problem. However, I've found many of the responses to be too terse. It would have been nicer if there had been more discussion and description of the thought process that led up to the conclusions they made for each comment or question. Instead, many of the responses only showed the screenshot of the text they added.

      Most of my comments or questions were answered. Below are my comments on some of the authors' responses.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> In the Abstract and Introduction, you've added more sentences about the intra-collective or inter-collective selection. However, these are either making analogies to the waterfall or just describing the result of the intra/inter-collective selection. I would still appreciate a proper definition of those terms, which is paramount for readers to understand the entire paper.

      (4) Consideration of environmental stochasticity.<br /> I think providing the reason 'why' the paper focuses on demographic stochasticity and not environmental stochasticity will greatly justify the paper's work. For example, citing papers that actually performed artificial selection and pointing out that your model captures the stochasticity from those kinds of experiments would be great.

      (5) Assumption about mutation rates.<br /> It would be great if you could add a citation in the added sentence to support your claim: "This scenario is encountered in biotechnology: .....".

    3. Reviewer #3 (Public review):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such difference in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) large.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      The phenomenon the authors characterize is ecological in nature, though it is maintained even when switching between types is possible. Calling this dynamics community evolution reflects a widespread ambiguity in the field, not ascribable just to this work.

      Although different types compete for being represented in the next generation's propagules, within-generation ecology is here representative of exponential growth. As species interactions are commonly manifest in lab serial dilution experiments, it would be interesting if future work explores the extent of the robustness of these results to density-dependent demography.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Common comments

      (1) Significance of zero mutation rate

      Reviewers asked why we included mutation rate even though setting mutation rate to zero doesn’t change results. We think that including non-zero mutation rate makes our results more generalisable, and thus is a strength rather than weakness. To better motivate this choice, we have added a sentence to the beginning of Results:

      (2) Writing the mu=0 case first

      Reviewers suggested that we should first focus on the mu=0 case, and then generalize the result. The suggestions are certainly good. However, given the large amount of work involved in a re-organization, we have decided to adhere to our current narrative. However, we now only include equations where mu=0 in the main text, and have moved the case of nonzero mutation rate to Supplementary Information.

      (3) Making equations more accessible

      We have taken three steps to make equations more readable.

      ● Equations in the main text correspond to the case of zero-mutation rate.

      ● The original section on equation derivation is now in a box in the main text so that readers have the choice of skipping it but interested readers can still get a gist of where equations came from.

      ● We have provided a much more detailed interpretation of the equation (see page 10).

      (4) Validity of the Gaussian approximation

      Reviewers raised concerns about the validity of Gaussian approximation on F frequency𝑓(𝜏). The fact that our calculations closely match simulations suggest that this approximation is reasonable. Still, we added a discussion about the validity of this approximation in Box 1.

      We also added to SI with various cases of initial S and F sizes. This figure shows that when either initial S or initial F is small, the distribution of𝑓(𝜏) is not normal. However, if initial S and F are both on the order of hundreds, then the distribution of 𝑓(𝜏) is approximately Gaussian.

      Public Reviews:

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle are important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamic systems.

      We have added this clarification in the main text:

      “Note that here, selection outcome is path-dependent in the sense of being sensitive to initial conditions. This phenomenon is distinct from hysteresis where path-dependence results from whether a tuning parameter is increased or decreased.

      Weaknesses:

      (1) Connecting structure and function

      In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main Figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      We have added this to introduction: “This is a common quest: whenever a collective function depends on both populations, collective function is maximised, by definition, at an intermediate frequency (e.g. too little of either population will hamper function [23]).”

      (2) Explain intra-collective and inter-collective selection better for readers.

      The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      This is a great point. We have added in Abstract:

      “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”

      and in Introduction

      “A selection cycle consists of three stages (Fig. 1). During collective maturation, intra-collective selection favors fast-growing individuals within a collective. At the end of maturation, inter-collective selection acts on collectives and favors those achieving the target composition. Finally during collective reproduction, offspring collectives sample stochastically from the parents, a process dominated by genetic drift.”

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.

      I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main Figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck is imposed on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      We have added an analytical approximation for N0˘, the Newborn size below which all target frequencies can be achieved in SI.

      Also, we have added lines indicating N0˘ in Fig4a.

      (4) Consideration of environmental stochasticity.

      The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      You are correct that our work considers only demographic stochasticity.

      Indeed, considering other types of stochasticity will be an exciting future research direction. We added in the main text:

      “Overall our model considers mutational stochasticity, as well as demographic stochasticity in terms of stochastic birth and stochastic sampling of a parent collective by offspring collectives. Other types of stochasticity, such as environmental stochasticity and measurement noise, are not considered and require future research.”

      (5) Assumption about mutation rates

      If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      This is a great point. We have added this to the beginning of Results to better motivate our study:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations. This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around. When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.

      See answer on common question 1.

      (6) Minor points

      In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.

      We added a description in caption 3b.

      In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      We increased the resolution of Fig 5b so that the gold region near FF is more visible.

      We have added Fig 5c and the following explanation to the main text:

      “From numerical simulations, we identified two accessible regions: a small region near FF and a band region spanning from S to F (gold in Fig. 5b i). Intuitively, the rate at which FF grows faster than S+F is greater than the rate at which F grows faster than S (see section VIII in Supplementary Information). Thus, the problem can initially be reduced to a two-population problem (i.e. FF versus F+S; Fig. 5c left), and then expanded to a three-population problem (Fig. 5c right).”

      Recommendations For The Authors

      Since the conclusion of the model greatly depends on the noise (variation) of F and S in the Gaussian distribution, it would be nice to have a plot where the y-axis is the variation in terms of frequency and the x-axis is the s_0 or f_0 (frequency). In the plot, I would love to see how the variation in the frequency depends on the initial frequency of S and F. Maybe this is just trivial.

      In the SI, we added Fig6a, as per your request. Previous Fig6 became Fig6b.

      Reviewer #2 (Public review):

      The authors provide an analytical framework to model the artificial selection of the composition of communities composed of strains growing at different rates. Their approach takes into account the competition between the targeted selection at the level of the meta-community and the selection that automatically favors fast-growing cells within each replicate community. Their main finding is a tipping point or path-dependence effect, whereby compositions dominated by slow-growing types can only be reached by community-level selection if the community does not start and never crosses into a range of compositions dominated by fast growers during the dynamics.

      These results seem to us both technically correct and interesting. We commend the authors on their efforts to make their work reproducible even when it comes to calculations via extensive appendices, though perhaps a table of contents and a short description of these appendices at the start of SI would help navigate them.

      Thank you for the suggestion. We have added a paragraph at the beginning of SI.

      The main limitation in the current form of the article is that it could clarify how its assumptions and findings differ from and improve upon the rest of the literature:

      -  Many studies discuss the interplay between community-level evolution and species- or strain-level evolution. But "evolution" can be a mix of various forces, including selection, drift/randomness, and mutation/innovation.

      - This work's specificity is that it focuses strictly on constant community-level selection versus constant strain-level selection, all other forces being negligible (neither stochasticity nor innovation/mutation matter at either level, as we try to clarify now).

      Note that intra-collective selection is not strictly “constant” in the sense that selection favoring F is the strongest at intermediate F frequency (Fig 3). However, we think that you mean that intra- and inter-collective selection are present in every cycle, and this is correct for our case, and for community selection in general.

      -  Regarding constant community-level selection, it is only briefly noted that "once a target frequency is achieved, inter-collective selection is always required to maintain that frequency due to the fitness difference between the two types" [pg. 3 {section sign}2]. In other words, action from the selector is required indefinitely to maintain the community in the desired state. This assumption is found in a fraction of the literature, but is still worth clarifying from the start as it can inform the practical applicability of the results.

      This is a good point. We have added to abstract:

      “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”

      - More importantly, strain-level evolution also boils down here to pure selection with a constant target, which is less usual in the relevant literature. Here, (1) drift from limited population sizes is very small, with no meaningful counterbalancing of selection, (2) pure exponential regime with constant fitness, no interactions, no density- or frequency-dependence, (3) there is no innovation in the sense that available types are unchanging through time (no evolution of traits such as growth rate or interactions) and (4) all the results presented seem unchanged when mutation rate mu = 0 (as noted in Appendix III), meaning that the conclusions are not "about" mutation in any meaningful way.

      With regard to point (1), Figure 4a (reproduced below) shows how Newborn size affects the region of achievable targets. Indeed at large Newborn size (e.g. 5000 and above), no target frequency is achievable (since drift is too small to generate sufficient inter-community variation and consequently all communities are dominated by fast-growing F). However at Newborn size of for example 1000, there are two regions of accessible target frequencies. At smaller Newborn size, all target frequencies become achievable due to drift becoming sufficiently strong.

      With regard to points (2) and (3), we have added to Introduction

      “To enable the derivation of an analytical expression, we have made the following simplifications.

      First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”

      With regard to point (4), we view this as a strength rather than weakness. We have added the following to the beginning of Results and Discussions:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”

      “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”

      See Point 1 of Common comments.

      - Furthermore, the choice of mutation mechanism is peculiar, as it happens only from slow to fast grower: more commonly, one assumes random non-directional mutations, rather than purely directional ones from less fit to fitter (which is more of a "Lamarckian" idea). Given that mutation does not seem to matter here, this choice might create unnecessary opposition from some readers or could be considered as just one possibility among others.

      We have added the following justification:

      “This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around.”

      It would be helpful to have all these points stated clearly so that it becomes easy to see where this article stands in an abundant literature and contributes to our understanding of multi-level evolution, and why it may have different conclusions or focus than others tackling very similar questions.

      Finally, a microbial context is given to the study, but the assumptions and results are in no way truly tied to that context, so it should be clear that this is just for flavor.

      We have deleted “microbial” from the title, and revised our abstract:

      Recommendations For The Authors

      (1) More details concerning our main remark above:

      - The paragraph discussing refs [24, 33] is not very clear in how they most importantly differ from this study. Our impression is that the resource aspect is not very important for instance, and the main difference is that these other works assume that strains can change in their traits.

      We are fairly sure that resource depletion is important in Rainey group’s study, as the attractor only evolved after both strains grew fast enough to deplete resources by the end of maturation. Indeed, evolution occurred in interaction coefficients which dictate the competition between strains for resources.

      Regardless, you raised an excellent point. As discussed earlier, we have added the following:

      “To enable the derivation of an analytical expression, we have made the following simplifications.

      First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”

      - We would advise the main text to focus on mu = 0, and only say in discussion that results can be generalized.

      Your suggestion is certainly good. However, given the large amount of work involved in a reorganisation, we have decided to adhere to our current narrative. However, as discussed earlier, we have added this at the beginning of Results to help orient readers:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”

      “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”

      (2) We think the material on pg. 5 "Intra-collective evolution is the fastest at intermediate F frequencies, creating the "waterfall" phenomenon", although interesting, could be presented in a different way. The mathematical details on how to find the probability distribution of the maximum of independent random variables (including Equation 1) will probably be skipped by most of the readers (for experienced theoreticians, it is standard content; for experimentalists, it is not the most relevant), as such I would recommend displacing them to SM and report only the important results.

      This is an excellent suggestion. We have put a sketch of our calculations in a box in the main text to help orient interested readers. As before, details are in SI.

      Similarly, Equations 2, 3, and 4 are hard to read given the large amount of parameters and the low amount of simplification. Although exploring the effect of the different parameters through Figures 3 and 4 is useful, I think the role of the equations should be reconsidered:

      i. Is it possible to rewrite them in terms of effective variables in a more concise way?

      See Point 3 of Common comments.

      ii. Is it possible to present extreme/particular cases in which they are easier to interpret?

      We have focused on the case where the mutation rate is zero. This makes the mathematical expressions much simpler (see above).

      (3) Is it possible to explain more in detail why the distribution of f_k+1 conditional to f_k^* is well approximated by a Gaussian? Also, have you explored to what extent the results would change if this were not true (in light of the few universal classes for the maximum of independent variables)?

      Despite the appeal to the CLT and the histograms in the Appendix suggesting that the distribution looks a bit like a Gaussian at a certain scale, fluctuations on that scale are not necessarily what is relevant for the results - a rapid (and maybe wrong) attempt at a characteristic function calculation suggests that in your case, one does not obtain convergence to Gaussians unless we renormalize by S(t=0) and F(t=0), so it seems there is a justification missing in the text as is for the validity of this approximation (or that it is simply assumed).

      See point 4 of Common comments.

      Reviewer #3 (Public Reviews):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such differences in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but the collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) larger.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection, but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      Thanks so much for these positive comments.

      I however found the description of the results too succinct and I think that more could be done to unpack the mathematical results in a way that is understandable to a broader audience. Moreover, the phenomenon the authors characterize is of purely ecological nature. Here, mutations of the growth rate are, in my understanding, neither necessary (non-trivial equilibria can be maintained also when \mu =0) nor sufficient (community-level selection is necessary to keep the system far from the absorbing state) for the phenomenon described. Calling this dynamics community evolution reflects a widespread ambiguity, and is not ascribable just to this work. I find that here the authors have the opportunity to make their message clearer by focusing on the case where the 'mutation' rate \mu vanishes (Equations 39 & 40 of the SI) - which is more easily interpretable, at least in some limits - while they may leave the more general equations 3 & 4 in the SI.

      See points 1-4 of Common comments.

      Combined with an analysis of the deterministic equations, that capture the possibility of maintaining high frequencies of fast growers, the authors could elucidate the dynamics that are induced by the presence of a second level of selection, and speculate on what would be the result of real open-ended evolution (not encompassed by the simple 'switch mutations' generally considered in evolutionary game theory), for instance discussing the invasibility (or not) of mutant types with slightly different growth rates.

      Indeed, evolution is not restricted to two types. However, our main goal here is to derive an analytical expression, and it was difficult for even two types. For three-type collectives, we had to resort to simulations. Investigating the case where fitness effects of mutations are continuously distributed is beyond the scope of this study.

      The single most important model hypothesis that I would have liked to be discussed further is that the two types do not interact. Species interactions are not only essential to achieve inheritance of composition in the course of evolution but are generally expected to play a key role even on ecological time scales. I hope the authors plan to look at this in future work.

      In our system, the S and F do interact in a competitive fashion: even though S and F are not competing for nutrients (which are always in excess), they are competing for space. This is because a fixed number of cells are transferred to the next cycle. Thus, the presence of F will for example reduce the chance of S being propagated. We have added this clarification to our main text:

      “Note that even though S and F do not compete for nutrients, they compete for space: because the total number of cells transferred to the next cycle is fixed, an overabundance of one population will reduce the likelihood of the other being propagated.”

      Recommendations For The Authors

      I felt the authors could put some additional effort into making their theoretical results meaningful for a population of readers who, though not as highly mathematically educated as they are, can nonetheless appreciate the implications of simple relations or scaling. Below, you find some suggestions:

      (1) In order to make it clear that there is a 'natural' high-frequency equilibrium that can be reached even in the absence of selection, the authors could examine first the dynamics of the deterministic system in the absence of mutations, and use its equilibria to elucidate the combined role of the 'fitness' difference \omega and of the generation duration \tau in setting its value. The fact that these parameters always occur in combination (when there are no mutations) is a general and notable feature of the stochastic model as well. Moreover, this model would justify why you only focus on decreasing the frequency in the new generation.

      Note that the ‘natural’ high-frequency equilibrium in the absence of collective selection is when fast grower F becomes fixed in the population. Following your suggestion, we have introduced two parameters 𝑅τ and 𝑊τ to reflect the coupling between ‘fitness’ and ‘generation duration’:

      (2) Since the phenomenon described in the paper is essentially ecological in nature (as the author states, it does not change significantly if the 'mutation rate' \mu is set to zero), I would put in the main text Equations 39 & 40 of the SI in order to improve intelligibility.

      See Point 2 at the beginning of this letter.

      These equations can be discussed in some detail, especially in the limit of small f^*_k, where I think it is worth discussing the different dependence of the mean and the variance of the frequency distribution on the system's parameters.

      This is a great suggestion. We have added the following:

      “In the limit of small , Equation (3) becomes f while Equation (4) becomes . Thus, both Newborn size (N<sub>0</sub>) and fold-change in F/S during maturation (W<sub>τ</sub>) are important determinants of selection progress.

      (3) I would have appreciated an explanation in words of what are the main conceptual steps involved in attaining Equation 2, the underlying hypotheses (notably on community size and distributions), and the expected limits of validity of the approximation.

      See points 3 and 4 at the beginning of this letter.

      (4) I think that some care needs to be put into explaining where extreme value statistics is used, and why is the median of the conditional distribution the most appropriate statistics to look at for characterizing the evolutionary trajectory (which seems to me mostly reliant on extreme values).

      Great point! We added an explanation of using median value in Box 1.

      and also added figure 7 to explaining it in SI.

      Showing in a figure the different distributions you are considering (for instance, plotting the conditional distribution for one generation in the trajectories displayed in Figure 2) would be useful to understand what information \bar f provides on a sequence of collective generations, where in principle there may be memory effects.

      Thanks for this suggestion. We have added to Fig 2d panel to illustrate the shape and position of F frequency distributions in each step in the first two selection cycles.

      (5) Similarly, I do not understand why selecting the 5% best communities should push the system's evolution towards the high-frequency solution, instead of just slowing down the improvement (unless you are considering the average composition of the top best communities - which should be justified). I think that such sensitivity to the selection intensity should be appropriately referenced and discussed in the main text, as it is a parameter that experimenters are naturally led to manipulate.

      In the main text, we have added this explanation:

      “In contrast with findings from an earlier study [23], choosing top 1 is more effective than the less stringent “choosing top 5%”. In the earlier study, variation in the collective trait is partly due to nonheritable factors such as random fluctuations in Newborn biomass. In that context, a less stringent selection criterion proved more effective, as it helped retain collectives with favorable genotypes that might have exhibited suboptimal collective traits due to unfavorable nonheritable factors. However, since this study excludes nonheritable variations in collective traits, selecting the top 1 collective is more effective than selecting the top 5% (see Fig. 11 in Supplementary Information).”

      (6) Equation 1 could be explained in simpler terms as the product between the probability that one collective reaches the transmitted value times the probability that all others do worse than that. The current formulation is unclear, perhaps just a matter of English formulation.

      We have revised our description to state:

      “Equation (1) can be described as the product between two terms related to probability: (i) describes the probability density that any one of the g Adult collectives achieves f given , and (ii) describes the probability that all other g – 1 collectives achieve frequencies above f and thus not selected.”

      (7) I think that the discussion of the dependence of the boundaries of the 'waterfall' region with the difference in growth rate \omega is important and missing, especially if one wants to consider open-ended evolution of the growth rate - which can occur at steps of different magnitude.

      We added a new chapter and figure in supplementary information on the threshold values when \omega varies. As expected, smaller \omega enlarges the success area.

      We have also added a new figure panel to show how maturation time affects selection efficacy.

      (8) Notations are a bit confusing and could be improved. First of all, in most equations in the main text and SI, what is initially introduced as \omega appears as s. This is confusing because the letter s is also used for the frequency of the slow type.

      The letter S is used to denote an attribute of cells (S cells), the type of cells (Equations 1-3 of the SI) and the number of these cells in the population, sometimes with different meanings in the same sentence. This is confusing, and I suggest referring to slow cells or fast cells instead (or at least to S-cells and F-cells), and keeping S and F as variables for the number of cells of the two types.

      All typos related to the notation have been fixed. We use S and F as types, and S and F (italic) and population numbers.

      (9) On page 3, when introducing the sampling of newborns as ruled by a binomial distribution, the information that you are just transmitting one collective is needed, while it is conveyed later.

      We have added this emphasis:

      “At the end of a cycle, a single Adult with the highest function (with F frequency f closest to the target frequency ) is chosen to reproduce g Newborn collectives each with N<sub>0</sub> cells (‘Selection’ and ’Reproduction’ in Fig. 1).”

      (10) I found that the abstract talks too early about the 'waterfall' phenomenon. As this is a concept introduced here, I suggest the authors first explain what it is, then use the term. It is a useful metaphor, but it should not obscure the more formal achievements of the paper.

      We feel that the “waterfall” analogy offers a gentle helping hand to orient those who have not thought much about the phenomenon. We view abstract as an opportunity to attract readership, and thus the more accessible the better.

      (11) In the SI there are numerous typos and English language issues. I suggest the authors read carefully through it, and add line numbers to the next version so that more detailed feedback is possible.

      Thank you for going through SI. We have gone through the SI, and fixed problems.

    1. eLife Assessment

      In this important quantitative study of HIV-1 evolution in humans and rhesus macaques, selection coefficients are inferred at scale over the HIV genome. Selection coefficients are similar in humans and macaques, providing compelling evidence that these coefficients are representative of the fitness landscapes of these viruses within hosts. This work will be of interest to the community working on quantitative evolution and fitness landscape inference, and the finding that rapid fitness gains in the HIV population predict bNAb emergence has significant implications for HIV vaccine design.

    2. Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They inferred selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      The manuscript is well written and organized.

      Comments on revisions:

      In their revised version the authors have addressed most of these points satisfactorily.

    3. Reviewer #2 (Public review):

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 3.

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      The fitness landscape of env in multiple hosts is immensely valuable especially because of how often SHIV has used as proxy for HIV. The strength of reversion-to-consensus selection is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus-macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. All of high interest to HIV researchers.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      Strength of evidence:

      Equation 3 is a beautiful and intuitive tool that accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. They have addressed my earlier concerns the effects of incomplete observations of the frequency bias fitness inference on rare sites.

      Whether the fact that fitness increases occured before or after the presence of the bnab remains incompletely known. bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Overall this is a convincing paper. It is a valuable introduction to a practical method of fitness inference at the scale of the entire env gene and how this information can be leveraged to learn some interesting biology.

    4. Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simian-human immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance. In the revised version, the Authors included a simple but clear explanation of the statistical method for inferring the model's parameters in the main text. Moreover, I find the potential implications of the methodology absent in the original submission very interesting.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. eLife Assessment

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

    2. Reviewer #1 (Public review):

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

    4. Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

    5. Author response:

      Reviewer #1 (Public review)

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      We will consider the inclusion of live imaging experiments using the expressión of C-terminus-tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we will explore how to develop these experiments to generate data for inclusion in a revised submission.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      We don´t discard the presence of “nano beads” in these axons. It was recently suggested that the normal morphology of axons is indeed resembling “pearls-on-a-string” (Griswold et al., 2025), with “nano beads” separated by thin tubular "connectors" (also referred to as NSV, for non-synaptic varicosities). However, it is unlikely that the gap-patch pattern of beta2-spectrin can be attributed to such a morphology, given we used formaldehyde as fixative, and Griswold and colleagues show that the use of aldehyde-based fixatives do not preserve NSVs. We are able to see scattered axonal enlargements (“micro beads”), as we described in distal portions in Fig. 1C(C2) and E. However, the number, appearance and staining of these are not compatible with the gap-patch pattern in beta2-spectrin. Moreover, we would have expected to see these NSVs in our extensive STED imaging, yet we did not. We will discuss this further in the resubmission.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, but we don´t interpret this as “patchy microtubules”. If the Reviewer refers to Fig. 2C-D, it is actually difficult to anticipate the slight decrease in intensity by the naked eye. To further support this, we will consider including stainings and quantitative analyses for microtubules in the resubmission. We are familiar with the use of permeabilizing conditions during fixation (in protocols known as “cytoskeletal fixation” to label microtubules (and not free tubulin).

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low beta2-spectrin and no MPS. Also, the dynamical evolution of the MPS has to take into account beta2-spectrin supply. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of beta2-spectrin . To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into overall MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring new actin filaments. However, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails). Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. It ultimately affects actin filaments as they end up losing monomers, and devoid of new monomers, filaments get shorter and eventually disappear. The drastic decrease in F-actin in our axons reflects that. The fact that F-actin in the MPS is preserved only speaks to the fact that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. We will support this with more observations and imaging and with a more extensive discussion summarizing the literature on the matter in the resubmission.

      On the other hand, the use of F-actin stabilizing drugs (like Jasplakinolide) would have a different effect. We will study how an experiment with these drugs could be informative of the process under investigation for the resubmission

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. As stated above, we will consider the inclusion of live imaging experiments using the expressión of C-terminus tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we think we can provide the evidence suggested.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      We will consider the inclusion of live imaging experiments using the expressión of tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we believe we can provide the evidence suggested. We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we will continue to work with these cells, but the goal of that project lies well beyond the primary message of the present manuscript, and we anticipate that the revised version will not include new data on this matter. 

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      We will further explore the inclusion of more measurements of other parameters and variables towards establishing whether these gaps-and-patches patterns are equivalent structures in control and staurosporine-treated cells. 

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      As stated before regarding similar comments by other reviewers, we will consider the inclusion of live imaging experiments in the revised version of the manuscript.

      Nicolas Unsain, PhD, and Thomas Durcan, PhD.

      References

      Griswold, J.M., Bonilla-Quintana, M., Pepper, R. et al. Membrane mechanics dictate axonal pearls-on-a-string morphology and function. Nat Neurosci 28, 49–61 (2025). https://doi.org/10.1038/s41593-024-01813-1

      Guix F.X., Marrero Capitán A., Casadomé-Perales A., Palomares-Pérez .I, López Del Castillo I., Miguel V., Goedeke L., Martín M.G., Lamas S., Peinado H., Fernández-Hernando C., Dotti C.G. Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Sci Alliance. 2021 Jun 28;4(8):e202101055. doi: 10.26508/lsa.202101055. Print 2021 Aug.

    1. eLife Assessment

      The effort is timely and the paper carries valuable insights into the function of UTR mutations. There are still significant concerns about both the quality of the screen data, and its ability to detect significant changes in translation and their direction. Therefore, the ability of the screen to support the extensive downstream statistical analysis is limited and leaves the paper incomplete.

    2. Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      Comments on revisions:

      It appears that the authors have extracted the information they could from the problematic dataset they obtained. Repeating the experiments in a cleaner setting, obtaining data for the >6000 UTRs they intended will allow the authors to achieve the goals they set out to achieve in establishing the screen.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We recognize that RNA distribution within polysomes is inherently less stable than the associated protein components. This instability has been noted in previous studies, including those cited by the reviewer, which used RNA from bulk polysomes to infer the translatome without fractionation. Acknowledging this limitation, we purposely adopted a conservative strategy: (i) performing gross fractionation of polysomes, and (ii) collaborating with biostatisticians at the Institute of Statistical Science, Academia Sinica, to design a conservative yet optimized analysis pipeline that minimized batch effects.

      This approach proved robust: representative cases in Fig. 2B clearly demonstrate distinct distributions of reference and alternative alleles. From our high-confidence dataset, we applied a well-established statistical framework specifically designed to accommodate multiple influencing factors in relatively small datasets (Elements of Statistical Learning by Hastie, Tibshirani, and Friedman). We further conducted sensitivity analyses to select an optimal QC cutoff across a range of stringencies, ensuring maximal reliability of our results. We have therefore successfully shortlisted UTR variants which have strong effect on translation.

      Building upon these conservative measures, we developed a predictive model for translation effects of UTR variants. Importantly, this model was validated not only with our internal test dataset but also with independent external datasets. In addition, the sequence features identified by the model were validated through reporter assays and in vivo CRISPR editing. These external and functional validations establish the generalizability and robustness of our approach.

      A more detailed analysis of the directionality of changes in translation efficiency is under active investigation. These results will be reported in a separate manuscript currently in preparation.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield highconfidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as basecase examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to doublecheck the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      Following the reviewer’s suggestion, we assessed the endogenous mutant effect by generating CRISPR knock-in clones carrying the IRF6:c.-4609G>A variant. We showed that this G>A variant generate a deleterious upstream open reading frame, which dramatically reduced protein expression of the main open reading frame (Fig. 7B-D). The genome editing further demonstrated the G>A variant reduced endogenous IRF6 protein expression to 23% or 44% in two independent clones. We have incorporated the genome editing results in the revised  main text and the new Figure 7E&F: 

      “To further validate the endogenous effect of the novel upstream ATG (uATG), we generated CRISPR knockin clones carrying the IRF6:c.-4609G>A variant and examined its impact on gene expression. The introduction of the uATG reduced RNA levels to 88% and 37% of the wild-type in two independent clones (Fig. 7E), and protein levels to 44% and 23%, respectively (Fig. 7F), resulting in an overall reduction of translation efficiency to 50–62%.“ (p.18)

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human DiseaseRelevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a protein-dependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysome-shifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3A - the claim that 5'UTR variants had a stronger effect than 3'UTR is based on the two UTRs with the strongest effect. It is unclear how these differences between 5' and 3'UTRs are significant.

      We carried out a Wilcoxon rank-sum test to examine the mut/WT fold change of translation efficiency between the 3’ and 5’ UTR variants. The results showed that the 5’ UTR variants exhibited a greater change of translation efficiency. We have inserted this result in the revised Figure 3C and refers to this figure in the main text: “Furthermore, we observed that 5’ UTR variants had a greater impact on translation activity relative to 3’ UTR variants (Fig. 3C).” (p. 12)

      (2) Figures 2B and S1, S2 - what is the meaning of less signal for a light chain and a similar signal for a heavy chain? How can this situation, while being a significant difference between the profiles, lead to a biologically relevant difference in eventual protein output?

      Taking 3’UTR ACADSB:c.*4177G>A (bottom-left panel in Figure 2B) as an example: WT transcripts have less read count (in the unit of log(CPM)) compared with the transcripts carrying the mutant UTR in the light polysome-containing fraction, whereas the read counts of the two genotypes are approximately the same in the heavy polysome-containing fraction.

      In line with our reply to Reviewer 1’s major comment 1, we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model (GLM). That is, the UTR pairs whose WT and mutant have different trends across the fractions (Mono to Light & Light to Heavy) are our targets. In Figure 2B, 3’UTR ACADSB:c.*4177G>A is a perfect example of our significant hits, as it displays the clear distinction of the trends of the two genotypes across three fractions.

      It is widely known that the alteration of polysome profiling distribution indicates the change of translational efficiency. Our GLM model helped us identify the UTR pairs whose WT and mutant have different polysome profiling patterns and thus likely have distinct translational efficiency. Nevertheless, since we only had limited polysome fractions in our experiments, we further validated our significant hits and confirmed the direction of regulation using luciferase reporter assay.

      (3) The paragraph starting with "Even with the high confidence dataset, we did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency" is confusing. The whole premise of the screen used by the authors is that polysome profiling is a useful proxy for estimating levels of translation, so claiming that it doesn't necessarily measure translation is counterintuitive.

      In line with our reply to the last question, our goal is to use the alteration of polysome profiling patterns as a proxy for the change of translational efficiency. However, due to the limited number of fractions in our experiment, we could not directly infer the direction of regulation, i.e. increase or decrease of translational efficiency, of the statistically significant variants. That is why we refrained from making any conclusion about the direction of the regulation for the significant hits and proceed to validate them using luciferase reporter assay.

      (4) Figure S5A - this is normalized to the nucleotide distribution in 5' or 3'UTRs? Is this statistic being applied to 27 SNPs in 3'UTRs?

      To identify sequence features associated with altered polysome association, we systematically analyzed both significant and nonsignificant UTRs for nucleotide and motif-level changes. Fisher’s exact test was employed to evaluate whether specific nucleotide or motif alterations were enriched or depleted in polysome-shifting UTRs, compared to nonsignificant UTR pairs. For example, in the case of nucleotide C (see table below; also Table S4 and new Fig. S6A), only four significant 3’ UTRs involved a change in C, resulting in a significant depletion of this nucleotide change among polysome-shifting 3’ UTRs (odds ratio = 0.22, p = 0.0069). Expanding this approach to all 1-7 nt motifs, we identified multiple motif and nucleotide changes that were significantly associated with altered polysome association.

      Author response table 2.

      (5) "uATG in the 5' UTR was not identified by the model as a widespread feature explaining polysome shifting". Is this because of the method of ribosome profiling or because of the sequences in the library? Can having more sequences in the library specifically looking at 5'UTR give more power for such an effect to emerge?

      Our assay design accounted for the presence of upstream ATG codons and the strength of adjacent Kozak sequences. However, additional factors known to influence the function of upstream open reading frames (uORFs)—such as the reading frame of the uORF relative to the main coding sequence, and the use of nonATG initiation codons—were not systematically included. As a result, the current assay may have limited sensitivity in detecting uORF-related regulatory effects. A dedicated design specifically tailored to uORF variants is likely to enhance the detection power and better capture their contribution to translational control.

      (6) Figure 7B- it is not clear whether the luciferase reporter and the GFP reporter in the library function in a similar manner; is it creating out-of-frame or in of in frame uORF? Also, it is not clear if the differences are statistically significant.

      In the MPRA library, the IRF6 uORF is out of frame relative to the GFP coding sequence. To directly assess its translational impact, we employed a luciferase reporter assay by fusing luciferase downstream of the IRF6 uORF. These constructs revealed a significant reduction in protein production, as shown in Figures 3 and 7B–F. Although the clinically relevant IRF6 uORF is out-of-frame with the main ORF, we engineered an inframe uORF variant to validate translation initiation at the upstream ATG (uATG) (Fig. 7B-D). The in-frame construct confirmed uATG usage and led to a significant reduction in luciferase protein expression. Together, these results support the conclusion that the IRF6:c.-4609G>A variant gives rise to an active uORF that suppresses translation of the main ORF.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be helpful for the authors to subcategorize their data in ways that they consider meaningful and interpretable (e.g. shifts from all monosome to heavy, all heavy to monosome/free, etc.) Relatedly, what do the authors think the functional meaning is when a given transcript has high mono/heavy occupancy but low light occupancy (like what is shown in Figure 2B for ANK2) in the polysome profiling experiment? It is not apparent why a transcript with a high ribosome occupancy (heavy) would also have light occupancy (light).

      From the amplicon sequencing data, we obtained read counts for each UTR variant across the monosome, light, and heavy polysome fractions. Notably, this approach does not preserve the original relative abundance of transcripts among the three fractions. That is, despite a greater abundance of mRNAs in the heavy polysome fraction, comparable numbers of sequencing reads were recovered from the monosome and light fractions. As a result, this method is not suitable for interpreting the global directionality of translational shifts but is well-suited for detecting relative differences in polysome association. Therefore, our experimental and analytical design—combining targeted amplicon sequencing with generalized linear modeling (GLM)—was optimized to identify UTR variants that alter polysome association, independently of absolute transcript abundance in each fraction.

      (2) The method put forward in Figure 2 would be more convincing if there was data showing reproducibility in the massively parallel reporter assay. Perhaps the mut/WT ratio for all transcripts can be plotted against each other and a statistical test of correlation can be performed.

      Thank you for pointing this out. To demonstrate the reproducibility of our massively parallel reporter assay, we have plotted scatter plots of the ratios of all transcripts (summing the monosome, light, and heavy fractions) across different batches using our high-confidence dataset. We calculated the Pearson correlation coefficients and corresponding p-values for these comparisons. The results show strong correlation between each batch, supporting the reproducibility of our assay. We have incorporated this analysis in the main text as well as Supplemental Figure 3: “Pearson correlation analysis revealed R coefficients ranging from 0.59 to 0.71 for the mut-to-WT transcript ratios across three independent experiments (Supplemental Fig. 3).”

      (3) The dots in Figure 2B indicate separate experiments, but the y-axis is log(counts). Values could be normalized (perhaps a ratio of mut/WT) for comparison between experiments.

      We aimed to compare UTR distribution across polysome fractions and recognized the importance of presenting the distribution patterns for both genotypes. This approach allows us to more clearly illustrate the differences or similarities in polysome association between the two genotypes.

      (4) When describing the 5' UTRs used for the validation experiments in Figure 3, more information about the 5' UTR sequence used is necessary. It is not clear how much or what part of the 5' UTRs were removed, or why this was necessary considering the same experiment was conducted using full-length UTRs.

      In the initial library design, technical limitations of bulk oligonucleotide synthesis constrained the UTRs to 155 nucleotides, comprising 115-nt of endogenous human UTR sequence flanked by 20-nt priming sites on both ends. Variants were centered at the 58th nucleotide within the 115-nt UTR sequence. When one flanking region of the native UTR was shorter than 57 nt, the variant was shifted accordingly toward the shorter arm to maintain the 115-nt UTR length (Fig. 2A).

      Given that endogenous UTRs in the human genome are often longer than 155 nt, we further evaluated the functional consequences of variants within full-length UTR sequences (Fig. 3B). While the mutant effects observed in the library setting were largely recapitulated, their magnitude was diminished in the full-length context, likely due to the increased sequence and structural complexity.

      To clarify the experimental design related to Figure 3, we modified the text as the following: “The variants significantly altering the polysome profile were then individually validated by means of high-sensitivity luciferase reporter assays (Fig. 3A). To that end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location.” (p. 11)

      (5) The conclusions from inserting RBP-binding motifs into 5' UTRs and assaying translational output (Figure 4) would be strengthened by including luciferase reporters containing endogenous 5' UTRs containing these motifs, and versions where the motifs are disrupted.

      Several variants that altered translation efficiency were validated in their native sequence contexts, including 5’ UTR variants in DMD and NF1 that affect SRSF1/2 binding sites, as well as a 3’ UTR variant in AL049650.1 that impacts a KHSRP binding site (Fig. 3 and Supplemental Figs. S1 & S2). To address the functional relevance of these variants within their native regulatory landscapes, we have incorporated the following clarification into the text (p. 13): “This observation is consistent with additional findings where variants that create or disrupt specific RBP binding sites—such as SRSF1/2 (e.g., in DMD and NF1; Fig. 2 and Supplementary Fig. S4) and KHSRP (e.g., in AL049650.1; Fig. 2 and Supplementary Figs. S4 & S5)—led to significant changes in translation efficiency within their native UTR contexts.”

      (6) Figure 5C shows that 5' UTR SNPs that form an uAUG are associated with greater structural changes, but this does not "indicate" that "structure‐modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF." The data presented in Figure 5 and luciferase/polysome data presented previously do not distinguish whether translation is occurring at an uAUG or canonical AUG. The statement quoted above is speculative and it should be clear that it is a hypothesis generated by the data and is not conclusive.

      We appreciate the reviewer’s suggestion. We have therefore modified our text to: ”Therefore, while changes in uATG may not be common explanatory factors for polysome-shifting mutations, our results suggest that structure-modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF.” (p. 14)

      Minor points/questions

      (1) The authors should clarify whether during library construction for massively parallel polysome profiling the 3' UTR constructs contain a common 5' UTR? Likewise, do the 5' UTR constructs contain a common 3' UTR? Perhaps the lack of a 5' UTR in the 3' UTR constructs, which is implied by Figure 2A, would influence differences seen between 3' UTR pairs (and likewise for 5' UTR pairs).

      There are short common 5’ UTRs appended to the 3’ UTR library, and likewise, a common short 3’ UTR is included in the 5’ UTR library. The common 5’ UTR comprises partial sequences from the CMV promoter and the plasmid backbone of pEGFP-N1 vector. The common 3’ UTR includes sequences from the pEGFP-N1 backbone and a short polyadenylation signal from HBA1 (hemoglobin subunit alpha 1). While we cannot entirely rule out potential crosstalk between 5’ and 3’ UTRs, the design ensures that all constructs are compared in a controlled and consistent context, enabling valid pairwise comparisons between variant and wildtype alleles.

      To clarify the library design, we have revised the main text to include this explanation: 

      “The entire library of UTR oligonucleotides (UTR library) was subsequently ligated upstream or downstream of an enhanced GFP (EGFP) coding region, along with a CMV promoter and a common UTR sequence on the opposite end. Cells transfected with the UTR library were treated with cycloheximide 14 hours post transfection and then subjected to polysome fractionation (see Methods).” (p.11) 

      “The variants significantly altering the polysome profile were then individually validated through highsensitivity luciferase reporter assays (Fig. 3A). To this end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location. As the initial library design, the test UTR segment differs only by one nucleotide, while a shared short UTR fragment is present on the opposite end of the coding sequence to ensure consistency across constructs (Fig. 2A).” (p. 12)

      (2) The lines connecting the polysome distribution points make the plots appear busy and difficult to read, the data would be easier to interpret if they were removed.

      We employed a generalized linear model (GLM) to identify the variants that altered the polysome association of the corresponding transcripts. Statistically speaking, we were looking for the variants which led to significant interaction between genotype and polysome fractions. Ergo, displaying the lines as it is in our plots offers readers a convincing visualization of the interaction: lines from WT and Mut groups were not parallel, which indicates the interaction between genotype and polysome fractions. Moreover, showing the lines from three batches of experiments also helps us ascertain the reproducibility of our experiments. Taken all together, the presence of the lines makes our plots even more informative.

    1. eLife Assessment

      In their study, Cummings et al. provide a valuable advance in understanding the hierarchical regulation of tubulin polyglycylation, demonstrating that TTLL8 initiates monoglycylation which is a prerequisite for TTLL10-mediated polyglycylation. The evidence supporting these mechanistic insights is solid, relying on a compelling combination of purified biochemical assays, mass spectrometry, and microscopy. The work is further valued for revealing an unexpected crosstalk between polyglycylation and polyglutamylation that ensures a balanced post-translational modification landscape for proper cilia function.

    2. Reviewer #1 (Public review):

      Summary:

      In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that the monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.

      Strengths:

      The manuscript is well written, and experiments are succinctly planned and outlined. The experiments used provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.

      Weaknesses:

      There were some weaknesses in the initial submission of the manuscript, but the authors have addressed these in their revised version either by giving clear explanations in the text or through additional experiments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS, and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how the level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.”

      Strengths: 

      The manuscript is well-written, and experiments are succinctly planned and outlined. The experiments were used to provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.”

      We thank the reviewer for their support of our study and recognition of its importance to understanding microtubule glycylation and its regulation.  

      “The initial part of the manuscript where the authors discuss about the requirement of monoglycylation by TTLL8 is not new. This was established back in 2009 when Rogowski et al (2009) showed that polyglycylation of tubulin by TTLL10 occurs only when co-expressed in cells with TTLL3 or TTLL8. So, this part of the study adds very little new information to what was known. “

      Our study provides the first in vitro evidence with purified recombinant components that human TTLL8 is exclusively a monoglycylase (Figure 1) and that polyglycylation by TTLL10 requires previous priming with monoglycylation (Figure 2). Studies with purified recombinant components are the gold standard for establishing the activity of an enzyme as cellular work can be obfuscated by the activity of other regulators. We did cite in our original submission the work by Rogowski, Gaertig and Janke from 2009 (reference 15 in the original submission) as well as that Ikegami and Setou 2009 work (reference 26 in the original submission) that established that TTLL10 polygyclylase activity requires co-expression with TTLL8 in cells. Specifically, we stated in our original submission and in the revised manuscript:

      “Cellular overexpression studies coupled with the use of antibodies that recognize mono- and polyglycylation indicate that TTLL8 is also a glycyl-initiase, while TTLL10 a glycyl-elongase (15, 26).  However, direct biochemical evidence with purified enzymes for segregated initiation and elongation activity for glyclases is still lacking as does knowledge of their substrate specificity and regulation.” 

      In addition to citing the Setou study, we now cite again the Rogowski, Gaertig and Janke 2009 study later in the manuscript when the cellular data are mentioned again.  Specifically, we state in the revised manuscript: 

      “This is consistent with cellular overexpression data which showed that polyglycylation signal was detected via antibody only in tubulin from cells that co-expressed TTLL8 and TTLL10, but not TTLL10 alone (15, 26).”

      “The study also fails to discuss the involvement of the other monoglycylase, TTLL3 in the entire study, which is a weakness as in vivo, in cells, both the monoglycylases act in concert and so, may play a role in regulating the activity of TTLL10. “

      We previously showed that purified recombinant TTLL3, like TTLL8, adds only monoglycines, with a preference for the b-tubulin tail (Garnham et al., PNAS 2017). Given that TTLL10 requires priming by monoglycylation, we expect that, similarly to TTLL8, TTLL3 will allow elongation of the initial monoglycyline chains by TTLL10. 

      (1) From the mass spec data, it appears that the Xaenopus Laevis TTLL10 can add up to 18 residues. However, the numbers indicated in Figure 2E seem to suggest that it is a maximum of 23 residues only at a particular position. Does this mean that the 13-18 residues observed are a collection of multiple short-chain polyglycylations or are there positions that the authors observed where there were chains of longer than 3 glycine residues? This would be an interesting point to note as when it was discovered in Paramecium, the polyglycyl chains were reported to be up to 34 residues (Redeker et al., Science 1994). If the authors could test the TTLL10 from Paramecium to observe if this is a consistent phenomenon across evolution or is there a biologically significant difference that is being developed, would be interesting to know.”

      Figure 2E shows a subset of the modified tails that we identified and where the position of the posttranslationally added glycine can be mapped to a specific position, or range of positions. Additional species exist. We note that the mass spectra in Figure 2B are intact LC/MS, while those in Figure 2E are MS/MS. The ionization of tubulin tail peptides with larger number of glycines is not as efficient as for shorter glycine chains, reducing the sensitivity of detection of species that have higher number of glycines. This is not as pronounced when the mass spectra are obtained from the intact protein (Figure 2B). In summary, our data supports the fact that TTLL10 elongates polyglycine chains at multiple positions in the tubulin tail (shown in Figure 2E), however, we cannot ascertain the maximum polyglycine chain length, only the total number of glycyines added.

      Testing the enzyme from Paramecium is an interesting proposal but outside the scope of this manuscript. 

      (2) While it is interesting to know that the TTLL10 binds to TTLL8-modified tubulin with a much higher affinity than unmodified tubulin, in vivo, the microtubules will be a mixture of both TTLL3- and TTLL8-modified tubulin. It would be good to see the binding of the enzyme to a tubulin that is modified by both TTLL3 and TTLL8 if the two have a greater influence on TTLL10 binding.”

      Our previous work showed that purified recombinant TTLL3 has purely monoglycylase activity, with a preference for b-tubulin (Garnham et al., PNAS 2017). The sites of monoglycylation by TTLL3 overlap with those introduced by TTLL8 on b-tubulin (the difference being mainly that TTLL3 is more selective towards b-tubulin and thus has lower activity on a-tubulin). TTLL8 introduces additional monoGlys on the a-tubulin tail. Therefore, it is unlikely that TTLL10 will have a different response to microtubules that carry similar numbers of Gly residues, regardless of whether introduced by TTLL8 or TTLL3 and 8. Our data show that TTLL10 binding increases with Gly number, but that the gains in affinity plateau as the density of glycine residues on the tails increases above a certain threshold, likely because one TTLL10 molecule recognizes one monoGly branch, and steric hindrance on the tubulin tail prevents further recruitment of additional TTLL10 molecules.  

      (3) The authors have always increased the number of monoglycines in beta-tubulin more than in alpha-tubulin. Is there a rationale for this? Since TTLL8 is known to predominantly modify alphatubulin (Rogowski et al., 2009; Gadadhar et al., 2017) why did the authors not check for the increased binding of the TTLL10 on dimers where the number of monoglycines on alpha-tubulin is higher than 1.1? Especially when they themselves observe in their mass spec that even on alphatubulin there are 1, 2, and 3 glycines added. I would like to see what happens if the ratio is high alpha-G + low beta-G”

      As our spectra in Figure 1 show, we find that TTLL8 is able to modify robustly in vitro both a- and b-tubulin but that it shows a slight preference for b-tubulin (Figure 1B). The work from the Janke group that the reviewer is referring to (Rogowski et al., 2009 and Gadahar et al., 2017) did not use recombinant, purified enzymes and unmodified microtubules as substrates and used axonemal tubulin (which carries many modifications), and so it is possible that the a-tubulin preference observed in that system when TTLL8 is overexpressed, is likely to other factors that do not reflect the biochemical property of the enzyme alone (for example, it could be because btubulin site are not available because they are already glutamylated). As can be seen from Figure 3D, the gain in affinity when increasing the number of glycines from one glycine is small, compared to the initial monoglycine added to the a- and the b-tubulin tail, likely reflecting that one tail cannot bind more than one TTLL10 at one time because of steric hindrance. Moreover, it is important here to note that glutamylation and glycylases compete for the same sites on the tubulin tails, as we have for example shown for TTLL3 and TTLL7 (Garnham et al., 2017), therefore the activity of these enzymes in vivo or with non-naïve substrates are context dependent and influences also what sites are available for TTLL10 to modify. In conclusion, by using recombinant enzymes and naïve tubulin we gain insight into the intrinsic property of these enzymes and therefore provide a framework for the interpretation of in vitro and in vivo observations. 

      (4) I wonder why the authors did not use the human TTLL10 to test if this also shows similar binding to the glycylated tubulin despite the fact that it is enzymatically inactive. If it does, then it would be interesting to see the kinetics of binding of this enzyme to see if the fall off of the enzyme from the tubulin is solely driven by the level of polyglycylation only, or if it has any other mechanism involved as well.”

      Work with human recombinant TTLL10, a TTLL10 homolog which was proposed to be inactive, will be an interesting future direction but outside the scope of this manuscript. We did note in our previous manuscript (Garnham et al., 2017, Figure S5) that the residues which are mutated in the human enzyme compared to other mammals are on the dorsal face of the enzyme, far away from the active site, raising an interesting question of how they inactivate the enzyme.   We need however to emphasize that our work clearly shows that it is polyglycylation on the microtubules that reduces binding of TTL10 to microtubules because experiments done in the absence of glycylating activity i.e. with enzyme that was incubated with microtubules that were pre-modified with polyglycline chains, but in the absence of glycyine substrate (precluding any glycylation activity during the binding assay) show that the binding decreases monotonically with the number of polyglycines  on the microtubule (Figures 4A, B).  

      (5) In Figure 5, the authors use monoglycylated tubulin that is either glutamylated or not to show that the activity of TTLL10 is enhanced by the extent of polyglutamylation present on the tubulin. However, there is no evidence of the enzyme binding to microtubules that are only glutamylated. It would be good to test this to determine if the binding is also dependent on both monoglycylation and glutamylation or is it only the enzyme activity.

      Figure 5E shows that TTLL10 binding increases with monoglycylation alone, and that glutamylation is additive and Figures 4A, B show that it is not the enzyme activity that affects the binding, but the glycylation state of the microtubule. We did not determine binding to microtubules that were only glutamylated, because TTLL10 would not be able to elongate polyglycine chains on those microtubules, even if it bound. 

      (6) The level of polyglycylation used in Figure 5 is quite low. It would be good to see how the length of the polyglycine chain impacts TTLL10 activity in the presence of polyglutamylation, and whether this has any cooperative effect leading to longer chain polyglycylation than what is seen with only monoglycylated tubulin.

      We expect longer chain polyglycylation to have an inhibitory effect as we show in Figure 4. 

      “(7) In the overall study, the authors fail to discuss whether the activity of both the glycylases at different sites on tubulin is sequential, or modifications at different residues happen all at once. If the authors were to do a sequential time course of the modification followed by MS/MS analysis, they could get some indications about this.”

      As the data in Figure 3D shows, the effect of adding more monoGly site on a tubulin tail has a muted effect on binding, indicating that the additional mono-Gly branches do not lead to more TTLL10 recruitment because of steric hindrance i.e. multiple TTLL10 enzymes cannot be accommodated on the same tail at the same time efficiently. This is consistent with the overall dimensions of the enzyme and the positions of its active site, which were modeled initially in our previous publication (Garnham et al., PNAS 2017).  The site of TTL10 action is pre-determined by the position of the mono-Gly branch introduced by TTLL3 or TTLL8. The length of the tubulin tail and the proximity of mono-Gly sites to each other precludes TTLL10 acting at multiple positions at once on the same tail.

      “(8) Do the modifications have any cooperative effect with respect to the sites of modification? Does modifying a particular site enhance the kinetics of modification of the other sites? Can the authors test this?”

      This would be an interesting line of future investigations.  

      “Minor points:

      (1’) The authors opine that the level of polyglycylation is regulated by the decreased binding of the TTLL10 to the polyglycylated tubulin. While this is an interesting argument, which could be a possibility based on the data they present, it would still not answer if this is a mechanism followed by TTLL10 of all species or not. If they could test the efficacy of TTLL10 from another species, to see the binding efficiency of that enzyme, it could potentially strengthen their argument of this possible mechanism.”

      The differences between the properties of TTLL10 from different organisms will be an interesting focus of future investigations, but outside the scope of this present study. However, we would like to point out that the level of sequence conservation between TTLL10 makes it unlikely that other TTLL10 do not follow a similar mechanism, albeit with possible differences in the extent of the response.  We also note that we have shown that polyglycylation also inhibits binding to the microtubule of the severing enzyme katanin (Szczesna et al., Dev. Cell 2022). Therefore, these studies suggests that polyglycylation might be a more general mechanism for reducing microtubule binding affinity since glycylation reduces the negative charge on the tubulin tails, which frequently interact with positively charged domains or interfaces in microtubule associated proteins.  

      “(2) The authors indicate that glycylases act on pre-glutamylated microtubules. However, in their assays, they use unmodified tubulin, which I would presume is also not glutamylated. If this is the case, how can they justify that the enzymes prefer pre-glutamylated microtubules? This is a bit unclear. Do they mean that their tubulin is already pre-glutamylated? Have they tested this?”

      The statement regarding the action of these enzymes on glutamylated microtubules refer to the in vivo situation where polyglycylated microtubules appear in cilia biogenesis after the microtubules in the axoneme are already glutamylated. In vitro, by using microtubules that are only monoglycylated and microtubules that are both glutamylated and monoglycylated, we show that glutamylation further increases recruitment of TTLL10 to microtubules that are monoglycyated. Therefore, glutamylated microtubules will be polyglycylated preferentially over those that are not glutamylated. 

      We state: “Axonemal microtubules are abundantly glutamylated. Glutamylation appears during cilia development first, followed by glycylation (12, 13), indicating that in this scenario glycylases act on pre-glutamylated microtubule substrates.”

      “(3) In continuation with the previous point, an immunoblot of their purified tubulin showing no reactivity to anti-glycylation or anti-glutamylation antibodies, which upon treatment with TTLL8 reacts to the anti-glycylation antibody would be confirmatory evidence to show that the isolated tubulin was indeed unmodified.”

      We have now included a Western blot of our TOG-purified tubulin as Figure S3 in our revised manuscript.  This shows a faint signal with the pep-G1 antibody and a very strong signal after TTLL8 treatment. We are not sure whether the low signal with the pep-G1 antibody for the unmodified tubulin is due to low bona fide monoglycylation-specific signal or a low affinity nonspecific interaction of this antibody (raised against mono-glycylated tubulin tail peptides) with the unmodified tubulin. We note that this signal is clearly visible only when loading at least 0.2 micrograms of the purified tubulin. At this loading level the signal for the glycylated species is saturated. It is also important to note that we have not detected glycylated species in this tubulin either by LC-MS or MS/MS. Therefore, our data strongly indicate that the tubulin purified from tsA201 cells is not glycylated or has at most extremely low levels of glycylation. Importantly, this potential trace level of monoglycylated tubulin does not affect any of the conclusions in this study. The Western blot also shows no detectable signal with the polyglycyation antibody in the unmodified tubulin and a very strong, saturated signal after the tubulin was treated with both TTLL8 and TTLL10.  We also added an additional Figure S8 that shows that the tSA201 tubulin does not give a detectable signal for glutamylation. Please see also Figure 3 from Vemu et al., Methods Enzymology 2017 where we also published a Western blot from our TOG-purified tubulin using anti-glutamylation antibodies. 

      “(4) In their study, the authors have used polyglycylation of up to 10-13 residues. This brings me to my first point that in the case of Paramecium, the number was identified to be up to 34, which would mean that this enzyme has higher binding or catalytic activity. I would like to know the authors' perspective on this, as to what could potentially determine the difference in the activities of TTLL10 across species.”

      The Xenopus TTLL10 enzyme can add more glycines than the 10-13 range that we show here if the enzyme is incubated for longer periods. The fact that glycine numbers as high as 34 were detected in Paramecium does not necessarily mean that the Paramecium enzyme is more active since there is no equivalent data to compare it with from Xenopus. The only way to address potential species differences in enzyme specific activity is to purify enzymes from different species and compare their activity side-by-side.  

      (5) How was the completion of the reaction of monoglycylation and polyglycylation determined? If the enzymes were left for more than 20 minutes, did TTLL8/ TTLL10 add more glycines? What is the reason for using less tubulin (1:20 enzyme:tubulin molar ratio) for monoglycylation by TTLL8, and more tubulin (1:50 enzyme:tubulin molar ratio) for polyglycylation by TTLL10?

      Yes, if the enzymes were incubated longer, they added more glycines. The extent of glycylation was determined from the LC-MS and the incubation time was varied to obtain samples with fewer or more glycines.   The lower ratio used for TTLL10 is because of the higher specific activity of that enzyme compared to TTLL8.  

      (6) Figure S2 A, b2 ion is not indicated in the peptide sequence, while it is shown in the m/z graph.

      We thank the reviewer for the careful reading. We have corrected this in our MS/MS spectrum. 

      Reviewer #2 (Public review):

      “In their manuscript, Cummings et al. focus on the enzymatic activities of TTLL3, TTLL8, and TTLL10, which catalyze the glycylation of tubulin, a crucial posttranslational modification for cilia maintenance and motility. The experiments are beautifully performed, with meticulous attention to detail and the inclusion of appropriate controls, ensuring the reliability of the findings. The authors utilized in vitro reconstitution to demonstrate that TTLL8 functions exclusively as a glycyl initiase, adding monoglycines at multiple positions on both α- and β-tubulin tails. In contrast, TTLL10 acts solely as a tubulin glycyl elongase, extending existing glycine chains. A notable finding is the differential substrate recognition between TTLL glycylases and TTLL glutamylases, highlighting a broader substrate promiscuity in glycylases compared to the more selective glutamylases. This observation aligns with the greater diversification observed among glutamylases. The study reveals a hierarchical mechanism of enzyme recruitment to microtubules, where TTLL10 binding necessitates prior monoglycylation by TTLL8. This binding is progressively inhibited by increasing polyglycine chain length, suggesting a self-regulatory mechanism for polyglycine chain length control. Furthermore, TTLL10 recruitment is enhanced by TTLL6mediated polyglutamylation, illustrating a complex interplay between different tubulin modifications. In addition, they uncover that polyglutamylation stimulates TTLL10 recruitment without necessarily increasing glycylation on the same tubulin dimer, due to the potential for TTLLs to interact with neighboring tubulin dimers. This mechanism could lead to an enrichment of glycylation on the same microtubule, contributing to the complexity of the tubulin code. The article also addresses a significant challenge in the field: the difficulty of generating microtubules with controlled posttranslational modifications for in vitro studies. By identifying the specific modification sites and the interplay between TTLL activities, the authors provide a valuable tool for creating differentially glycylated microtubules. This advancement will facilitate further studies on the effects of glycylation on microtubule-associated proteins and the broader implications of the tubulin code. In summary, this study substantially contributes to our knowledge of posttranslational enzymes and their regulation, offering new insights into the biochemical mechanisms underlying microtubule modifications. The rigorous experimental approach and the novel findings presented make this a pivotal addition to the field of cellular and molecular biology.”

      We thank the reviewer for their support of our work.

    1. eLife Assessment

      This study provides convincing evidence of coordinated spiking activity of neurons in the anterior cingulate cortex (ACC), and correlated activity in the CA1 subregion of the hippocampus, during observational learning. The authors also show coordinated ACC-CA1 neural activity during rest periods prior to the performance of the observationally learned task. The important findings significantly advance the field's understanding of neural mechanisms underlying social learning.

    2. Reviewer #1 (Public review):

      Summary:

      Mou and Ji investigate the relationship between firing rates in the anterior cingulate cortex (ACC) and CA1 neurons during observational learning. They found trajectory-selective responses in the ACC, coordinated activity between ACC and CA1 place cells for specific trajectories, and reactivation of these ensembles during sharp-wave ripples (SWRs), particularly during hippocampal replay events. The study is methodologically sound, the data are clearly presented, and the conclusions are well supported. The work is both novel and highly relevant to our understanding of social learning. Compared to the previous version of the paper, they have added substantial characterization of neuronal properties related to their firing during the task and replay events. I believe that the authors have therefore addressed most of my concerns and recommend the paper for publication as is.

      Strengths:

      The study is well designed, the data presented is very clear and the conclusions are appropriate regarding their results. The study is novel and of high relevance for the understanding of social learning.

      Weaknesses:

      All previous weaknesses have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, Xiang Mou and Daoyun JI investigate how ACC neurons activated by observational learning communicate with the hippocampus. They assess this line of communication through a complex behavioral technique, in vivo electrophysiology, pharmacological approaches, and data analytical techniques. Firstly, authors find that observational performance is dependent on the ACC, and that the ACC possess neurons that show side selectivity (trajectory related) in both the observation box, when shuttling to reward, and during subsequent maze running, shuttling to the corresponding same side for reward. The side-selective activation appears stronger for correct trials compared to error trials specifically during observation of Demo rats. They compare how the CA1 of the hippocampus encodes these two environments and find that ACC side-selective neurons show correlation with side-selective CA1 ensembles during maze behavior, water consumption, and sharp-wave ripples.

      Strengths:

      Overall, the paper provides strong evidence that ACC neurons are activated by observational learning and that this activation seems to be correlated with CA1 activity.

      Weaknesses:

      Concerns, however, surround the strength of evidence that links ACC and CA1 activity during observational learning. Only weak correlations between the two regions are shown, and it is unclear if the ACC may lead CA1 activity or vice versa. It is possible that these processes reflect two parallel pathways. Without manipulation of ACC, it is difficult to assess whether ACC activity influences hippocampal replay.

      Comments on revisions:

      Lines 361-362: R and P values do not match that of Figure 5C.

    4. Reviewer #3 (Public review):

      Summary:

      Mou and Ji investigated neuro-computational mechanisms behind observational spatial learning in rats and reported several signs of functional coupling between the ACC and CA1 at the single neuron level. Using multi-site tetrode recording, they found that ACC cells encoding a path in a maze were activated while a rat observed another rat taking that path. This activation was also correlated with the activation of CA1 cells encoding the same path and facilitated their replay during sharp-wave ripples (SWRs) before the recording rat ran on the maze by itself. These activity patterns were associated with correct path choice during self-running and were absent in control conditions where the recording rat did not learn the correct choice through observations. Based on these findings, the authors argue that ACC cells capture the critical information during observation to organize hippocampal cell activity for subsequent spatial decisions.

      Strengths:

      The authors used multiple outcome measures to build a strong case for path-specific spike coordination between ACC and CA1 cells. The analyses were conducted carefully, and proper control measures were used to establish the statistical significance of the detected effects. The authors also demonstrated tight correlations between the spike coordination patterns and the successful use of observed information for future decisions.

      Weaknesses:

      (1) As evidence for the activation of path information in the ACC during observation, the authors showed positive correlations between firing rates during observation and those during self-running. This argument will be solidified if the authors use a decoding approach to demonstrate the activation of path-selective ACC ensemble activity patterns during observation. This approach will also open the door to uncovering how the activation of ACC path representation is related to the moment-to-moment position of the demonstrator rat and whether it is coupled with the timing of SWRs.

      (2) The authors argued that the ACC biases the content of awake replay in CA1 during SWRs in the observation period. The reviewer wonders if a similar bias also occurs during SWRs in the self-run period (i.e., water consumption after the correct choice). This analysis will help test whether the biased replay occurs due to the need to convert observed information into future choices.

      (3) Although the authors demonstrated the necessity of the ACC for the task, it remains to be determined whether firing coordination between the ACC and CA1 during observation is necessary for the correct path choice during self-runs. Some discussion on this point, along with future direction, would be beneficial for readers.

      Comments on revisions:

      The authors fully addressed my recommendations. I do not have any further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      (1) In Figure 2, only the right or left selective neurons are presented for the comparison, it would be helpful to also compare these with the neurons that are not selective for any of the sides and maybe include them in the supplemental materials

      We have included all non-selective neurons in Figure 2D and supplemental Figure 2B. Their differences in firing rate between left and right sides are quantified by their selective indices (SIs). 

      (2) The authors should provide controls of speed during NMDA infusion and vehicle.

      We have quantified and compared the duration of running laps, which is equivalent to speed.

      (3) In Figure 1d, the trend shows that even during NMDA infusion, the animals learn as shown by a higher proportion of correct trials in the 3rd compared to the 1st trial

      We thank the reviewer for pointing that out. We noticed that NMDAlesioned ACC animal showed a trend of improved performance in the track, and we believe this is due to re-learning of the task, which we point out in the main text. However, we emphasize that, compared to the Vehicle control, the overall performance of NMDA-lesioned animals was significantly impaired.

      (4) Clarify the implications of the NMDA experiments, as it is not straightforward to interpret that an interplay between ACC-CA1 is involved in this task as per this experiment.

      Rather than stating the involvement of ACC-CA1 interplay, we use the results of NMDA lesion experiment to demonstrate that ACC is also required, besides CA1, for the task.

      (5) In Figure 4b, there seems to be a lag between CA1 and ACC correlations; the authors could provide a quantification of this temporal delay between CA1 and ACC.

      Figure 4B shows the cross-correlation between one example ACC cell and its associated CA1 ensembles on the left and opposite sides. There was a broad peak around time lag 0. Our further investigation did not identify a significant, systemic delay for all ACC cells, which led us to quantify the correlation at time lag 0 in Figure 4C and D.

      (6) The example correlation provided in 5c for the opposite, doesn't seem representative of the population trend as shown in 5d, since both the Same and the Opposite for the demo show a positive trend. It would be best to choose an example that represents the population better.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 5C.

      (7) Almost the same can be applied to Figure 6.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 6E.

      (8) The results in Figure 7 are convincing, in my opinion, as they show that the trend is lost for the opposite side (contrary to the coactivation shown in Figures 5 and 6 that showed the same trends for the same and opposite during Demo). Do the authors have any interpretation of this? Is it due to co-activity reflecting other task-relevant features different than the spatial trajectory being observed?

      The correlation on the opposite side between CA1 and ACC shown in Figure 5C-D and Figure 6E-F is likely due to a general interaction between CA1 activities around SWRs with prefrontal cortical areas including ACC, as shown in previous studies (Jadhav et al., 2016; Remondes and Wilson, 2015).  We would like to point out that this correlation only quantifies the coactivation between CA1 ensemble firing rates and individual ACC cells’ firing rate. This raw correlation does not consider the content of spikes generated by CA1 ensembles, neglecting the sequential firing patterns of CA1 cells. The replay analysis in Fig. 7 examines the order of spikes generated by individual CA1 cells. The result in Fig. 7 shows that the sequential activation of CA1 place cells more accurately reflects the distinction between the same- and opposite-side trajectories. We consider Fig. 7 is more refined analysis than Figs. 5 and 6.

      (9) For all the figures regarding SWR activities, the authors should provide average PSTH for CA1 as well as ACC, perhaps also examples of neurons that are selectively active during one side or the opposite side runs.

      Following the reviewer’s suggestion, we have added data to show PSTH for CA1 and ACC cells surrounding SWR peaks (Figure S5E, F). 

      Reviewer #2 (Recommendations For The Authors):

      Below are additional notes for improvements.

      (1) Figure 1C. Unclear what Time 0 indicates.

      We specify it (OB's poke time) in the figure legend. 

      (2) Figure 2C. Unclear what the numbers above datapoints mean.

      Those numbers are selection indices (SIs), as specified in the legend. 

      (3) Figure 5: Line 374-375. Given the repetitive nature of the task, it is unclear whether SWRs are encoding upcoming or past spatial trajectories or whether they are encoding trajectories at all. The authors would need to show that SWRs-ACC communication is predictive of task outcome to claim it is specifically necessary for future outcomes rather than consolidating past trajectories.

      We agree with the reviewer and have made changes to reflect that the ACC-CA1 correlation in Fig.5 is specific to the same side of their selectivity, not exactly to future trajectories. Regarding the repetitive nature of the task (same-side rule), we have specifically addressed the advantage and limitation of this task design in the discussion. Regarding the observer's own past vs. future trajectories, our past publication (Mou et al., 2022) shows that the CA1 replay in SWRs more likely encode the correct, future trajectories. 

      (4) Figure 7. It appears that the correlation was conducted between ACC activity and CA1 replays recorded at distinct time windows (delay period vs. water consumption). It is unclear how ACC activity could influence CA1 replays when they occur hundreds of milliseconds apart or even longer.

      We thank the reviewer for raising this important question. We have shown that the higher same-side ACC activity during observation continues during water consumption. However, our added data in Fig.S5E show that this enhancement did not occur precisely within SWRs. We thus propose a possibility that the overall enhanced activity of same-side ACC cells during water consumption provides an overall, background excitation boost to same-side CA1 cells to enhance their replay within SWRs. We have revised the discussion section to present this model. 

      (5) Abstract: lines 24-25 Discussion: lines 475-476 Based on the data there is no certainty whether ACC biases or coordinates CA1 replays. The data simply shows that they are correlated with one another.

      We have modified those sentences to clarify the non-causal nature of the interaction.

      Reviewer #3 (Recommendations For The Authors):

      Please see below for the list of minor corrections and suggestions:

      (1) Line 136-143: On the data shown in Figure 1D, I recommend using two-way mixed ANOVA with sessions as a within-subjects factor and groups as a between-subjects factor.

      We thank the reviewer for this point. We indeed use two-way ANOVA for those comparisons. We have specified out in the text.

      (2) Line 219-228: I recommend expanding the explanation of two control conditions here. It was written in the method section, but the readers would appreciate the gist of these conditions in the result section. In particular, it was unclear how box SI was calculated in the Empty condition. Also, the plots of poke rates in the control conditions will be useful to show that rats did not learn the correct choice from observation in these control conditions.

      We have added more explanation of the two control conditions in the text. The quantifications of poke rates for Demo and two control conditions (Object, Empty) are provided in our previous publication (Mou et al., 2022).

      (3) Line 610: Please specify the number of three types of sessions each rat underwent and the order of these session types.

      We revise the texts in the Method section and provide the numbers.

      (4) In Figure 2c legend, please specify what the number (e.g., -0.41) indicates.

      Those numbers are selection indices (SIs), as specified in the legend.

    1. eLife Assessment

      This valuable study introduces a data augmentation approach based on generative unsupervised models to address data imbalance in immune receptor modeling. Support for the findings is solid, showing that the use of generated data increases the performance of downstream supervised prediction tasks, e.g., TCR-peptide interaction prediction. However, the validation, mainly relying on synthetic data, could be completed, especially regarding unseen epitopes, and given the exclusive focus on CDR3β. The results should be of interest to the communities working on immunology and biological sequence data analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a deep learning framework for predicting T cell receptor (TCR) binding to antigens (peptide-MHC) using a combination of data augmentation techniques to address class imbalance in experimental datasets, and introduces both peptide-specific and pan-specific models for TCR-MHC-I binding prediction. The authors leverage a large, curated dataset of experimentally validated TCR-MHC-I pairs and apply a data augmentation strategy based on generative modeling to generate new TCR sequences. The approach is evaluated on benchmark datasets, and the resulting models demonstrate improved accuracy and robustness.

      Strengths:

      The most significant contribution of the manuscript lies in its data augmentation approach to mitigate class imbalance, particularly for rare but immunologically relevant epitope classes. The authors employ a generative strategy based on two deep learning architectures:

      (1) a Restricted Boltzmann Machine (RBM) and

      (2) a BERT-based language model, which is used to generate new CDR3B sequences of TCRs that are used as synthetic training data for creating a class balance of TCR-pMHC binding pairs.

      The distinction between peptide-specific (HLA allele-specific) and pan-specific (generalized across HLA alleles) models is well-motivated and addresses a key challenge in immunogenomics: balancing specificity and generalizability. The peptide-specific models show strong performance on known HLA alleles, which is expected, but the pan-specific model's ability to generalize across diverse HLA types, especially those not represented in training, is critical.

      Weaknesses:

      The paper would benefit from a more rigorous analysis of the biological validity of the augmented data. Specifically, how do the synthetic CDR3B sequences compare to real CDR3B in terms of sequence similarity, motif conservation? The authors should provide a quantitative assessment (via t-SNE or UMAP projections) of real vs. augmented sequences, or by measuring the overlap in known motif positions, before and after augmentation. Without such validation, the risk of introducing "hallucinated" sequences that distort model learning remains a concern. Moreover, it would strengthen the argument if the authors demonstrated that performance gains are not merely due to overfitting on synthetic data, but reflect genuine generalization to unseen real data. Ultimately, this can only be performed through elaborate experimental wet-lab validation experiments, which may be outside the scope of this study.

      While generative modeling for sequence data is increasingly common, the choice of RBM, which is a relatively older architecture, could benefit from stronger justification, especially given the emergence of more powerful and scalable alternatives (e.g., ProGen, ESM, or diffusion-based models). While BERT was used, it will be valuable in the future to explore other architectures for data augmentation.

      The manuscript would be more compelling if the authors performed a deeper analysis of the pan-specific model's behavior across HLA supertypes and allele groups. Are the learned representations truly "pan" or merely a weighted average of the most common alleles? The authors should assess whether the pan-specific model learns shared binding motifs (anchor residue preferences) and whether these features are interpretable through attention maps. A failure to identify such patterns would raise concerns about the model's interpretability and biological relevance.

      The exclusive focus on CDR3β for TCR modeling is biologically problematic. TCRs are heterodimers composed of α and β chains, and both CDR1, CDR2, and CDR3 regions of both chains contribute to antigen recognition. The CDR3β loop is often more diverse and critical, but CDR3α and the CDR1/2 loops also play significant roles in binding affinity and specificity. By generating only CDR3B sequences and not modeling the full TCR αβ heterodimer, the authors risk introducing a systematic bias toward β-chain-dominated recognition, which will not reflect the full complexity of TCR-peptide-MHC interactions.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a thoughtful and well-motivated strategy to address a major challenge in TCR-epitope binding prediction: data imbalance, particularly the scarcity of positive (binding) TCR, peptide pairs. The authors introduce a two-step pipeline combining data balancing, via undersampling and generative augmentation, and a supervised CNN-based classifier. Notably, the use of Restricted Boltzmann Machines (RBMs) and BERT-style transformer models to generate synthetic CDR3β sequences is shown to improve model performance. The proposed method is applied to both peptide-specific and pan-specific settings, yielding notable performance improvements, especially for in-distribution peptides. Generative augmentation also leads to measurable gains for out-of-distribution epitopes, particularly those with high sequence similarity to the training set.

      Strengths:

      (1) The authors tackle the well-known but under-addressed issue of class imbalance in TCR-epitope binding data, where negatives vastly outnumber positive (binding) pairs. This imbalance undermines classifier reliability and generalization.

      (2) The model is tested on both in-distribution (seen epitopes) and out-of-distribution (unseen epitopes) scenarios. Including a synthetic lattice protein benchmark allows the authors to dissect generalization behavior in a controlled environment.

      (3) The paper shows a measurable benefit of generative. For example, AUC improvements of up to +0.11 are observed for peptides closely related to those seen during training, demonstrating the method's practical impact.

      (4) A direct comparison between RBM- and Transformer-based sequence generators adds value, offering the community guidance on trade-offs between different generative architectures in TCR modeling applications.

      Weaknesses:

      (1) Generalization degrades with epitope dissimilarity

      The performance drops substantially as the test epitope becomes more dissimilar to the training set. This is expected, but it highlights an essential limitation of the generative models: they help only when the test epitope is similar to one already seen. Table 1 shows that the performance gain from generative augmentation decreases as the test epitope becomes more dissimilar to the training epitopes. For epitopes with a Levenshtein distance of 1 from the training set, the average AUC improvement is approximately +0.11. This gain drops to around +0.06 for epitopes at distance 2. It becomes minimal for those at distance 4, indicating an explicit limitation in the model's ability to generalize to more distant epitopes. The authors should quantify more explicitly how far the model can generalize effectively. What is the performance degradation threshold as a function of Levenshtein distance?

      (2) What is the minimal number of positive samples needed for data augmentation to help?

      The approach has an intrinsic catch-22: generative models require data to learn the underlying distribution and cannot be applied to epitopes with insufficient data. As a result, the method is unlikely to be effective for completely new epitopes. Could the authors quantify the minimum number of real binders needed for effective generative augmentation? This would be particularly relevant for zero-shot or few-shot prediction scenarios, where only 0-10 positive samples are available. Such experiments would help clarify the practical limits of the proposed strategy.

      (3) Lack of end-to-end evaluation on unseen epitopes as inputs

      The authors frame peptide-specific models as classification over a few known epitopes, a closed-set formulation. While this is useful for evaluating generation effects, it's not representative of the more practical open-set task of predicting binding to truly novel epitopes. A stronger test would include models that take peptides as input (e.g., pan-specific, peptide-conditioned classifiers), including unseen epitopes at test time. Could the authors attempt an evaluation on benchmarks like IMMREP25 or other datasets where test epitopes are excluded from training?

      (4) Focus on β-chain limits generalizability

      The current pipeline is trained exclusively on CDR3β sequences. However, the field is increasingly moving toward single-cell sequencing, which provides paired α/β TCR chain data. Understanding how the proposed approach performs when both chains are available would be valuable. Could the authors evaluate the performance gains on paired α/β information, even in a small subset of single-cell data?

      (5) Synthetic lattice proteins (LPs) have limited biological fidelity

      While the LP-based benchmark presented in Figure 5 is a clever and controlled tool for probing model generalization, it remains conceptually and biophysically distant from real TCR-peptide interactions. Its utility as a toy model is valid, but its limitations should be more explicitly acknowledged:

      a) Over-simplified binding landscape: The LP system is designed for tractability, with a simplified sequence-structure mapping and fixed lattice constraints. As shown in Figure 5c, the LP binding landscape is linearly separable, in stark contrast to the complex and often degenerate nature of real TCR-epitope interactions, where multiple structurally distinct TCRs can bind the same peptide and vice versa.

      b) Absence of immunological context: The LP model abstracts away key biological factors such as MHC restriction, α/β chain pairing, peptide presentation, and structural constraints of the TCR-pMHC complex. These are essential for understanding binding specificity in actual immune repertoires.

      c) Overestimation of generalization: While performance drops on more distant LP structures, even these are structurally and statistically more similar to the training data than truly novel biological epitopes. Thus, the LP benchmark likely underestimates the true difficulty of out-of-distribution generalization in real-world TCR prediction tasks.

      d) Simplified biophysics: The LP simulations rely on coarse-grained energy models and empirical potentials that do not capture conformational dynamics, side-chain flexibility, or realistic binding energetics of TCR-peptide interfaces.

      In summary, while the LP benchmark helps isolate specific generalization behaviors and for sanity-checking model performance under controlled perturbations, its biological relevance is limited. The authors should explicitly frame these assumptions and limitations to prevent overinterpreting results from this synthetic system.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a method to address class imbalance in T cell receptor (TCR)-epitope binding datasets by generating synthetic positive binding examples using generative models, specifically BERT-based architectures and Restricted Boltzmann Machines (RBMs). They hypothesize that improving class balance can enhance model performance in predicting TCR-peptide binding.

      Strengths:

      (1) Interesting biological as well as technical topic.

      (2) Solid technical foundations.

      Weaknesses:

      (1) Fundamental Biological Oversight:

      While the computational strategy of augmenting positive samples via generative models is technically interesting, the manuscript falls short in addressing key biological considerations. Specifically, the authors simulate and evaluate only CDR3β-peptide binding interactions. However, antigen recognition by T cells involves both the α- and β-chains of the TCR. The omission of CDR3α undermines the biological realism and limits the generalizability of the findings.

      (2) Validation of Simulated Data:

      The central claim of the manuscript is that simulated positive examples improve predictive performance. However, there is no rigorous validation of the biological plausibility or realism of the generated TCR sequences. Without independent evaluation (e.g., testing whether synthetic TCR-peptide pairs are truly binding), it remains unclear whether the performance gains are biologically meaningful or merely reflect artifacts of the generation process.

      (3) Risk of Bias and Overfitting:

      Training and evaluating models with generated data introduces a risk of circularity and bias. The observed improvements may not reflect better generalization to real-world TCR-epitope interactions but could instead arise from overfitting to synthetic patterns. Additional testing on independent, biologically validated datasets would help clarify this point.

    5. Author response:

      We would like to thank editors and reviewers for their time spent on our work, fair assessments and constructive criticism. We plan to address their concerns in the future revision as follows, detailed by topic.

      (1) Limitations of focusing on CDR3β only

      In its current state, our work tested the proposed pipeline of data augmentation for binding prediction on benchmark datasets limited to peptide+CDR3β sequence pairs only. As pointed out by all the reviewers, the TCR-peptide interaction is more complex and involves also other regions of the receptor (such as the CDR3α chain) and the MHC presenting the peptide as well. To investigate how the inclusion of additional information impacts results, we plan to apply our pipeline in a setting where the generative protocol is extended to generate paired α and β. The supervised classifier will then receive a concatenation of α+β chains as inputs. We will compare the performance of this classifier with the one using β chains only, and add this analysis to the revised manuscript.

      (1) Validation of generated sequences and interpretation of the features learned by the generative model

      The reliability of the generative model in augmenting the training set with biologically sensible sequences is a crucial assumption of our approach, and we agree with the reviewers raising this as a main concern. Before stating our strategy to improve the soundness of the method, let us first point out a few aspects already considered in the present manuscript:

      • The test set of the classifier is always composed of real sequences: in this way, an increase in performance due to data augmentation cannot be due to overfitting to synthetic, possibly unrealistic, sequences.

      • The generative protocol is initialized from real sequences, and used to generate sequences not too far from them. In this respect, it could be taken as a way to “regularize” the simplest strategy of data augmentation, random oversampling (taking multiple copies of sequences at random to rebalance the data). This procedure avoids generating “wildly hallucinated” sequences with unreliable models. We will better quantify this statement (see below).

      • The training protocol is tailored to push the generative model towards learning binding features between peptide and CDR3β sequences (and not merely fitting their local statistics separately). For example, in the pan-specific setting, during training of the generative model on peptide+CDR3β sequences, the masked language modeling task is modified to force the model to recover the missing amino acid using only the other sequence context.

      We will better stress these points in the revised manuscript. To further validate the generative protocol in the future revision, we will carry out additional sanity checks on the generated data to confirm that the synthetic sequences remain biologically plausible and comparable to real ones.

      (1) Assessment of the performance of the pan-specific protocol for out-of-distribution data:

      To better clarify how the degradation in performance of a classifier tested on out-of-distribution data is impacted by the dissimilarity between test and training data distribution, we will improve the synthetic analysis currently reported in Table 1, adding confidence intervals for accuracy, quantifying thresholds on the distance for the method to work, providing t-SNE embeddings of in- and out-of distribution data.

      (2) Quantification of the threshold for the number of examples per class in order to train the generative model and obtain a performance increase

      In the paper, we adopted an operative common-sense threshold of at least 100 sequences per class in order to apply our data augmentation pipeline. We will quantify this effect testing this threshold in the revised manuscript, in order to (i) emphasize the limits of this two-step generative protocol in the low-data regime and to (ii) assess if the generative model falls back to a random oversampling strategy (due to strong overfitting) when few data are available for training.

      (3) Motivation for the use of RBMs:

      While RBMs have known limitations, their use in our pipeline (together with the more modern TCR-BERT, that we also test) is mainly motivated by the fact that they provide measurable increases in performance with data augmentation despite their simple 2-layer architecture. We stress that simpler generative (profile) models are unable to show this increase, see Appendix 3. In this respect, the RBM provides a minimal generative model allowing us to augment data successfully, and a lower bound to the increase of performance with respect to more complex architectures trained on more data. We will report this point of view in the text.

      (4) Clarification on the role of lattice proteins as an oversimplified toy model for protein interaction

      We agree with the points raised by Reviewer #2 on the limitations of lattice proteins as a model for protein interaction. Indeed, we used it merely as a toy model for phenomenology, a strategy whose validity has been fairly acknowledged by the reviewer. We will report in the main text all the drastic simplifications and reasons why the reader should take the comparison to real data with great care.

    1. eLife Assessment

      This valuable study combines microscopy and CRISPR screening in two different cell lines to identify factors involved in global chromatin organization, using centromere clustering as a proxy. Follow-up cell cycle synchronisation studies confirm roles in centromere clustering in mitosis. However, incomplete characterisation of the cell lines used limits the interpretation of the findings. The study will interest researchers studying genome organisation in mitosis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guin and colleagues establish a microscopy-based CRISPR screen to find new factors involved in global chromatin organization. As a proxy of global chromatin organization, they use centromere clustering in two different cell lines. They find 52 genes whose CRISPR depletion leads to centromere clustering defects in both cell lines. Using cell cycle synchronisation, they demonstrate that centromeres-redistribution upon depletion of these hits necessitates cell cycle progression through mitosis.

      Strengths:

      This manuscript explores the mechanisms of global chromatin organization, which is a scale of chromatin organization that remains poorly understood. The imaging-based CRISPR screen is very elegant, and the use of appropriate positive and negative controls reinforces the solidity of the findings.

      Weaknesses:

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

    3. Reviewer #2 (Public review):

      The authors begin by highlighting the importance of genome organisation in cellular compartmentalisation and identity. They focus their study on centromeres - key chromosomal features required for segregation-and aim to identify proteins responsible for their spatial distribution in interphase nuclei. However, none of the experimental data addresses broader aspects of genome architecture, such as individual chromosome territories or A/B compartments. As such, the title of the article may be misleading and would benefit from being more specific, for example: "Identification of factors influencing centromere positioning in interphase."

      Strengths:

      One of the strengths of the paper is the comprehensive CRISPR-based screening and the comparative analysis between two distinct cell lines.

      Including further investigation into factors that behave differently across these cell lines - particularly in relation to expression levels or the unique "inverted architecture" of RPE cells-would have added valuable depth.

      Weaknesses:

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      Finally, the additive effects observed in double knockdowns do not necessarily confirm pathway independence. It is possible that mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Guin et al. use a CRISPR KO screen of ~1000 candidates in two human cell lines, along with high-throughput image analysis, to demonstrate that orderly progression through mitosis shapes centromere organization. They identify ~50 genes that perturb centromere clustering when depleted in both RPE1 and HCT116 cells and validate many of these hits using RNAi. They then use auxin-mediated acute depletion of four factors (NCAPH2, KI67, SPC24, and NUF2) to demonstrate that their effects on centromere clustering require passage through mitosis. They further suggest that the lack of these factors during mitosis leads to the disorganization of centromeres on the mitotic spindle, and these effects persist in the subsequent interphase. Overall, the manuscript is clear, well-written, the experiments performed are appropriate, and the data are interpreted accurately. In my opinion, the main strength of this manuscript is the discovery of several hits associated with altered centromere organization. These hits will serve as a solid foundation for future work investigating genome organization in human cells. On the other hand, how the changes in centromere organization relate to other aspects of interphase genome architecture (A/B compartments, chromosome territories, etc) remains unclear and represents the main shortcoming of this manuscript.

      Comments:

      (1) Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      (2) I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

  2. Sep 2025
    1. eLife Assessment

      This important study reports that two distinct waves of ovarian follicles contribute to oocyte production in mice. The paper provides large amounts of data that will benefit future studies, although the methods and analysis are considered incomplete at present. Justification for the criteria of wave 1 follicles would benefit from further explanation and discussion. This work will be of interest to ovarian biologists and physicians working on female infertility.

    2. Reviewer #1 (Public review):

      Multiple waves of follicles have been proven to exist in multiple species, and different waves of follicles contribute differently to oogenesis and fertility. This work characterizes the wave 1 follicles in mouse comprehensively and compares different waves of follicles regarding their cellular and molecular features. Elegant mouse genetics methods are applied to provide lineage tracing of the wave 1 folliculogenesis, together with sophisticated microscopic imaging and analyses. Single-cell RNA-seq is further applied to profile the molecular features of cells in mouse ovaries from week 2 until week 6. While extensive details about the wave 1 follicles, especially the atresia process, are provided, the authors also identified another group of follicles located in the medullary-cortical boundary, which could also be labeled by the FoxL2-mediated lineage tracing method. The "boundary" or "wave 1.5" follicles are proposed by the authors to be the earliest wave 2 follicles, which contribute to the early fertility of puberty mice, instead of the wave 1 follicles, which undergo atresia with very few oocytes generated. The wave 1 follicle atresia, which degrades most oocytes, on the other hand, expands the number of theca cells and generates the interstitial gland cells in the medulla, where the wave 1 follicles are located. These gland cells likely contribute to the generation of androgen and estrogen, which shape oogenesis and animal development. By comparing scRNA-seq data from cells collected from week 2 until week 6 ovaries, the author profiled the changes in numbers of different cells and identified key genes that differ between wave 1 and wave 2 follicles, which could potentially be another driver of different waves of folliculogenesis. In summary, the authors provide a high amount of new results with good quality that illustrate the molecular and cellular features of different waves of mouse follicles, which could be further reused by other researchers in related fields. The findings related to the boundary follicles could potentially bring many new findings related to oogenesis.

      This paper is overall well-written with solid and intriguing conclusions that are well supported. The reviewer only has some minor comments for the authors' consideration that could potentially help with the readability of the paper.

      (1) The authors identify the wave 1.5 follicles at the medullary-cortex boundary, which begin to develop shortly after 2 weeks. Since the authors already collected scRNA-seq data from week 2 until week 6, could any special gene expression patterns be identified that make wave 1.5 follicle cells different from wave 1 and wave 2?

      (2) Are Figures 1C and 1E Z projections from multiple IF slices? If so, please provide representative IF slice(s) from Figures 1C and 1E and clearly label wave 1 and wave 2 follicles to better illustrate how the wave 1 follicles are clarified and quantified.

      (3) For Figure 3D, please also provide an image showing the whole ovary section, like in Figures 3A and 3C, to better illustrate the localization and abundance of different cells.

      (4) In Figure 4H, expressions of HSD3B1 and PLIN1 seem to be detected in almost all medulla cells. Does this mean all medulla cells gain gland cell features? Or there is only a subset of the medulla cells that are actively expressing these 2 proteins. Please provide image(s) with higher magnification to show more clearly how the expression of these 2 proteins differs among different cells.

      (5) Figure 5: The authors discussed cell number changes for different types of cells from week 2 to week 6. A table, or some plots, visualizing numbers of different cell types, instead of just providing original clusters in Dataset S6, at different time points, would make the changes easier to observe.

      (6) Figure S7: It would be more helpful to directly show the number of wave 1 follicles.

      (7) Did the fluorescence cryosection staining (Line 587 - 595) use the same buffers as in the whole-mount staining (Line 575 - 586)? Please clarify.

      (8) In Line 618, what tissue samples were collected? Please point out clearly.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores an important question concerning the developmental trajectory of wave 1 ovarian follicles, leveraging valuable tools such as lineage tracing and single-cell RNA sequencing. These approaches position the authors well to dissect early follicle dynamics. The study would benefit from more in-depth analysis, including quantification using the lineage-traced ovaries, and comparison of wave 1 and 2 follicular cells per stage within the single cell dataset.

      Strengths:

      This study aims to address an important question regarding the developmental trajectories of wave 1 ovarian follicles and how they differ from wave 2 follicles that contribute to long-term fertility. This is an important topic, as many studies on ovarian follicle development rely on samples collected at perinatal timepoints in the mouse, which primarily represent wave 1 follicles, to infer later fertility. The research group has the tools and expertise necessary to tackle these questions.

      Weaknesses:

      Wave 1 follicles are quantified based on the criteria of oocytes larger than 20 µm located within the medullary region, using whole-mount staining. However, the boundary between the medulla and cortex appears somewhat arbitrary. Quantification using FOXL2-lineage-traced ovaries provides a more reliable method for identifying wave 1 follicles. As the developmental trajectory of wave 1 follicles has been well described in Zhang et al. 2013, it would be valuable to provide a more detailed quantification of both labeled and unlabeled follicles by specific follicle stages. In fact, in Zhang et al. 2013, the authors demonstrated that lineage-labeled primordial follicles can be found at the cortex-medulla boundary, suggesting that the observation of labeled "border follicles" is not unexpected. Quantification by follicle stage would provide greater insight into the timing and development of these follicles.

      Similarly, the analysis of wave 1 follicle loss should be performed on lineage-traced ovaries using cell death markers to demonstrate the loss of oocytes and granulosa cells, while confirming the preservation of theca and interstitial cells. In particular, granulosa cell loss should be assessed directly with cell death markers in lineage-traced ovaries, rather than from the loss of tamoxifen-labeled cells, as labeling efficiency varies between follicles (Figure 2G).

      Single-cell RNA sequencing presents a valuable dataset capturing the development of first-wave follicles. The use of a 40µm cell strainer during cell collection for the 10x platform may explain the exclusion of larger oocytes. However, it is still surprising that no oocytes were captured at all. The central question, how wave 1 follicular cells differ from wave 2 cells, should be investigated in more depth, with results validated on FOXL2-lineage-traced ovaries (i.e., Wnt4 staining in wave 1 antral follicles versus wave 2 using lineage-traced ovaries). This analysis should span all stages of follicle development. It also appears to be a missed opportunity that the single-cell sequencing analysis was not performed on lineage-traced ovaries, which would have enabled more definitive identification of wave 1-derived cells.

      Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles.

    4. Author response:

      The eLife assessment states that our manuscript is important only as a source of data for others to use in the future. Our methods and analysis of wave 1 follicles were said to be "incomplete" because one of two reviewers claimed we did not prove that 80% of wave 1 oocytes turn over by 5 wk.

      We believe that this assessment is simply wrong because critical supporting data already present in the existing manuscript was not understood by one reviewer. Wave 1 follicular oocyte turnover was said to be unproven and to remain uncertain because evidence of death was based only on a lack of Ddx4 staining. New experiments documenting expression of cell death markers, were said to be needed to show the oocytes died. However, our work was not based on the analysis of sectioned material, but used whole mount 3D reconstruction microscopy of cleared ovary preparations. Oocyte death was determined by the absence of an oocyte in fully reconstructed follicles and its replacement with an empty cavity, not just the absence of antibody staining. We included images and complete 3D reconstruction movies documenting these methods. The paper also documents that the holes frequently still contained zona pellucida remnants indicating the former presence of an oocyte. Moreover, we observed many intermediates of oocyte death- shrunken and deformed oocytes- and deformations of follicle structures due to the presence of the empty cavities. Controls showed that Ddx4 staining in the context of 3D imaging always revealed an obvious giant labeled oocyte in 100% of wave 1 follicles prior to death, and in wave 1.5 and wave 2 follicles at all stages. Thus, our methodology is already fully reliable. The reviewer is correct that the entire program of wave 1 development including their programmed turnover would be interesting to explore further. We already provided a large amount of new gene expression information, and documented the first examples of wave 1-specific gene expression. Further studies are not needed for the major conclusions of the paper and can wait for a follow up study.

      Secondly, the existence of wave 1.5 is not "speculative," as stated by the reviewing editor. We extensively validated and quantified the existence of wave 1.5 primordial follicles following Foxl2-cre activation at E16.5, and analysis at 2 wks in multiple experiments. Additionally, we showed wave 1.5 follicles were present at the medullar/cortex border at 2 wks even after activation of Foxl2-cre at E14.5. Our paper also connected for the first time wave 1.5 follicles to a population of non-growing, "poised" primordial follicles at this identical location near the medulla/cortex boundary by Meinsohn et al. in 2021. These follicles had not started to develop yet, and their ultimate fate was not known. We followed the development of these follicles and determined several differences in wave 1.5 follicle gene expression compared to wave 1. As noted in the assessment, our findings on wave 1.5 are now already being extended to other systems such as primate ovaries (adopting our name "wave 1.5" from our bioRxiv manuscript). The simultaneous claims that our discovery of wave 1.5 exists is speculative, and also that other people are finding wave 1.5 follicles in the species they are studying are logically incompatible.

      Response to reviewer 2:

      Line 239-245: Please note that Zhang et al. 2013 also show that lineage-labeled primordial follicles can be found at the cortex-medulla boundary (see their Figure 1B).

      The single image in the Zheng et al. 2014 paper may or may not show mosaic primordial follicles, but it would not be surprising since the experiment was identical to experiments in the paper. However, that single picture is only meaningful in the context of our subsequent work reported in the current manuscript. There was no mention of these follicles in the text of Zheng et al. 2014, no documentation or quantitation of their numbers, and no discussion or understanding of their significance. The incorrect conclusions of the paper were that wave 1 follicles- meaning rapidly developing follicles in the medulla- give rise to most early offspring. This conclusion reversed the previously accepted (and essentially correct) view that wave 1 follicles did not contribute significantly to fertility.

      "Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles." 

      We showed by lineage marking that only about 25 of about 200 wave 1 follicles survive even to wk 5. This clearly does prove our conclusion that the great majority of wave 1 follicles do not contribute to fertility.

    1. eLife Assessment

      This important study reports that higher genetically predicted BMI is associated with a modestly increased risk of head and neck cancer. The convincing evidence is supported by rigorous Mendelian Randomization approaches, using multiple genetic instruments and models that reduce sensitivity to pleiotropy. However, results from pleiotropy-robust analyses were less consistent, which limits the strength of causal inference. The work will be of interest to researchers studying cancer risk factors and genetic epidemiology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have conducted the largest to date Mendelian Randomization (MR) analysis of the association between genetically predicted measures of adiposity and risk of head and neck cancer (HNC) overall and by subsites within HNC. MR uses genetic predictors of an exposure, such as gene variants associated with high BMI or tobacco use, rather than data from individual physical exams or questionnaires and if it can be done in its idealized state, there should be no problems with confounding. Traditional epidemiologic studies have reported a variety of associations between BMI (and a few other measures of adiposity) and risk of HNC that typically differs by the smoking status of the subjects. Those findings are controversial given the complex relationship between tobacco and both BMI and HNC risk. Tobacco smokers are often thinner than no-smokers so this could create an artificial ('confounded') association that may not be fully adjusted away in risk models. The findings of a BMI-HNC association are often attributed to residual confounding and this seems ripe for an MR approach if suitable genetic instrumental variables can be created. Here the authors built a variety of genetic instrumental variables for BMI and other measures of adiposity as well as two instrumental variables for smoking habits and then tested their hypotheses in a large case-controls set of HNC and controls with genetic data.

      The authors found that the genetic model for BMI was associated with HNC risk in simple models, but this association disappeared when using models that better accounted for pleiotropy, the condition when genetic variants are associated with more than one trait such as both BMI and tobacco use. When they used both adiposity and tobacco use genetic instruments in a single model, there was a strong association with genetically predicted tobacco use (as is expected) but there was no remaining association with genetic predictors of adiposity. They conclude that high BMI/adiposity is not a risk factor for HNC.

      Strengths:

      The primary strength was the expansive use of a variety of different genetic instruments for BMI/adiposity/body size along with employing a variety of MR model types, several of which are known to be less sensitive to pleiotropy. They also used the largest case-control sample size to date.

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR and the addition of those models is therefore quite important as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result and in that case, they are more limited in their ability to test their hypothesis as these models do not show a robust BMI instrumental variable association.

      Comments on the revised manuscript:

      After the first round of review, the authors have improved the manuscript by (1) adding the requested power calculations and adding text to help the reader integrate that additional information; (2) adding the main effects for the tobacco instruments; (3) updating the comparison of their results to the prior literature; (4) and some other edits to the text. They have declined to include the smoking stratified estimates and provide a rationale for this decision that references the potential for collider bias. While true that yet another bias might be introduced, that gets added to the list and the careful reader would know that. Many important questions in cancer etiology can only be addressed via observational approaches and each observational approach has the potential for a long list of biases. The best inference comes from integrating the totality of the data and realizing that most conclusions are subject to updating as we conduct more work and learn more.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      Reviewing Editor Comments:

      We suggest that the authors add power estimates to assess whether the sample size is sufficient, given the strength and variability of the genetic instruments. It would also be helpful to present effect estimates for the tobacco instruments alone, to clarify their independent contribution and improve the interpretation of the joint models. In addition, the role of pleiotropy should be addressed more clearly, including which model is considered primary. Stratified analyses by smoking status are encouraged, as prior studies indicate that BMI-HNC associations may differ between smokers and non-smokers. Finally, the comparison with previous studies should be revised, as most reported null findings without accounting for tobacco instruments. If this study finds an association, it should not be framed as a replication

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we have incorporated them in the revised manuscript. We have edited the text as follows (lines 151-155):“Consequently, we used the total R<sup>2</sup> values to examine the statistical power in our study[42]. However, we acknowledge that the value of post-hoc power calculations is limited, since the statistical power estimated for an observed association is already reflected in the 95% confidence interval presented alongside the point estimate[43].” We have also added supplementary figures 1 and 2.

      We can see that when using the latest HEADSpAcE data we were able to detect BMI-HNC ORs as small as 1.16 with 80% power, while the GAME-ON dataset only permitted the detection of ORs as small as 1.26 using the same BMI instruments (Figure B). We have explained these figures in the results section as follows (lines 257-263): “Using the BMI genetic instruments (total R<sup>2</sup>= 4.8%) and an α of 0.05, we had 80% statistical power to detect an OR as small as 1.16 for HNC risk (Supplementary Figure 1). For WHR (total R<sup>2</sup>= 3.1%) and WC (total R<sup>2</sup>= 4.4%), we could detect odds ratios (ORs) as small as 1.20 and 1.17, respectively. This is an improvement in terms of statistical power compared to the GAME-ON analysis published by Gormley et al.[28], for which there was 80% power to detect an OR as small as 1.26 using the same BMI genetic instruments (Supplementary Figure 2).”

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We have now clarified this in the methods section of the revised manuscript as advised. Lines 165-171:

      “Because the IVW method assumes all genetic variants are valid instruments[44], which is unlikely the case, three pleiotropy-robust two-sample MR methods (i.e., MR-Egger[45], weighted median[46] and weighted mode[47]) were used in sensitivity analyses. When the magnitude and direction of effect estimates are consistent across methods that rely on different assumptions, the main findings are more convincing. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even if they are not equally powered.”

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      We agree that it would be useful to present the univariable MR effect estimates for smoking behaviour and HNC risk along those obtained using multivariable MR. We have now included the univariable MR estimates for both smoking behaviour variables as a note under Supplementary Table 11 and in the manuscript (lines 316-318): “In univariable IVW MR, both CSI and SI were linked to an increased risk of HNC (CSI OR=4.47 per 1-SD higher CSI, 95%CI 3.31–6.03, p<0.001; SI OR=2.07 per 1-SD higher SI 95%CI 1.60–2.68, p<0.001) (Additional File 2: note in Supplementary Table 11).”

      We understand the appeal of conducting stratified MR analyses by smoking status. However, we anticipate such analyses would hinder the interpretation of our findings as they can induce collider bias which could spuriously lead to different effect estimates across strata[12, 13].

      We thank the reviewer/editors for their comment regarding the way we frame of our findings. We have now edited the discussion section to highlight our study results are different to those obtained in studies that do not account for smoking behaviour. Lines 398-401: “With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      Reviewer #1 (Recommendations for the authors):

      The authors do share a table of the percent variance explained of the different genetic instruments, which vary widely, and that table is very welcome because we can get some sense of their utility. The problem is that they don't translate that into a power estimate for the case-control study size that they use. They say that it is the biggest to date, which is good, but without some formal power estimate, it is not particularly reassuring. A framework for MR study power estimates was reported in PMID: 19174578, but that was using very simple MR constructs in use in 2009, and it isn't clear to me if that framework can be used here. That power paper suggests that weak genetic instruments need very large sample sizes, far larger than what is used in the current manuscript. I am unable to estimate the true strength of the instruments used here, and so I am unsure of whether power is an issue or not.

      We have now included power calculations in our manuscript to address the reviewer’s concerns. Nevertheless, as mentioned above, post-hoc power calculations are of limited value, as statistical power is already reflected in the uncertainty around the point estimates (the 95% confidence intervals). Hence, it is important to avoid drawing conclusions regarding the likelihood of true effects or false negatives based on these calculations.

      Although the hypothesis here is that smoking accounts for the apparent BMI association previously reported for HNC, it would have been preferable to see the estimates for their 2 genetic instruments for tobacco alone. The current results only show the BMI instruments alone and then with the tobacco instruments. I would like to see what the risk estimates are for the tobacco instrument alone, so that I can judge for myself what happens in the joint models. As presented, one can only do that for the BMI instruments.

      We thank the reviewer for this comment. The univariable IVW MR estimate of smoking initiation was OR=2.07 (95%CI 1.60 to 2.68, p<0.001), while the one for comprehensive smoking index was OR=4.47 (95%CI 3.31 to 6.03, p<0.001). We have included this information in the manuscript as requested (please see response to reviewing editor above).

      On line 319, they write that "We did not find evidence against bias due to correlated pleiotropy..." I find this difficult to parse, but I think it means that they should believe that correlated pleiotropy remains a problem. So again, they seem to see their primary model as compromised, and so do I. This limitation is again stated by the authors on lines 351-352.

      We apologise if the wording of the sentence was not easy to understand. When using the CAUSE method, we did not find evidence to reject the null hypothesis that the sharing (correlated pleiotropy) model fits the data at least as well as the causal model. In other words, our CAUSE finding and the inconsistencies observed across our other sensitivity analyses led us to believe that our main IVW MR estimate for BMI-HNC was likely biased by correlated pleiotropy. We believe it is important to explore the source of this bias, which is why we used multivariable MR to investigate the direct effect of BMI on HNC risk while accounting for smoking behaviour.

      In the following paragraphs (lines 358-369), the authors state that their findings are consistent with prior reports, but that doesn't seem to be the case if we take their primary BMI instrument as representing the outcome of this manuscript. Here, they find an association between the BMI instrument and HNC risk, but in each of the other papers they present the primary finding was null without the extensive model changes or the aim of accounting for tobacco with another instrument. I don't see that as replication.

      This is a good point. We have now edited the discussion of our manuscript to avoid giving the impression that our findings replicate those from studies that do not account for smoking behaviour in their analyses. We have edited lines 384-401 as follows:

      “Previous MR studies suggest adiposity does not influence HNC risk[27-29]. Gormley et al.[28] did not find a genetically predicted effect of adiposity on combined oral and oropharyngeal cancer when investigating either BMI (OR=0.89 per 1-SD, 95% CI 0.72–1.09, p=0.26), WHR (OR=0.98 per 1-SD, 95% CI 0.74–1.29, p=0.88) or waist circumference (OR=0.73 per 1-SD, 95% CI 0.52–1.02, p=0.07) as risk factors. Similarly, a large two-sample MR study by Vithayathil et al.[29] including 367,561 UK Biobank participants (of which 1,983 were HNC cases) found no link between BMI and HNC risk (OR=0.98 per 1-SD higher BMI, 95% CI 0.93–1.02, p=0.35). Larsson et al.[27] meta-analysed Vithayathil et al.’s[29] findings with results obtained using FinnGen data to increase the sample size even further (N=586,353, including 2,109 cases), but still did not find a genetically predicted effect of BMI on HNC risk (OR=0.96 per 1-SD higher BMI, 95% CI 0.77–1.19, p=0.69). With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      We also deleted part of a sentence in the discussion section, so lines 416-418 now look as follows: “An important strength of our study was that the HEADSpAcE consortium GWAS used had a large sample size which conferred more statistical power to detect effects of adiposity on HNC risk compared to previous MR analyses[27-29].”

      On lines 384-386 they note a strength is that this is the largest study to date, but I would reiterate that larger and more powerful does not equate to adequately powered.

      This is true. We have included power calculations in the manuscript as requested.

      It's well known that different HNC subsites have different etiologies, as they mention on lines 391-392, and it is implicit in their use of data on HPV positive and negative oropharyngeal cancer. They say that they did not find evidence for heterogeneity in this study, but that would only be true for the null BMI instrument. The effect sizes for their smoking instruments are strikingly different between the subsites.

      We agree and are sorry for the confusion we may have caused by the way we worded our findings. We have edited the text to clarify that the lack of subsite heterogeneity only applied to our results for BMI/WHC/WC-HNC risk. Lines 418-424 now read as follows:

      “Furthermore, the availability of data on more HNC subsites, including oropharyngeal cancers by HPV status, allowed us to investigate the relationship between adiposity and HNC risk in more detail than previous MR studies which limited their subsite analyses to oral cavity and overall oropharyngeal cancers[28, 68]. This is relevant because distinct HNC subsites are known to have different aetiologies[69], although we did not find evidence of heterogeneity across subsites in our analyses investigating the genetically predicted effects of BMI, WHR and WC on HNC risk.”

      Finally, the literature on mutational patterns gives us strong reason to believe that HNC caused by tobacco are biologically distinct from tumors not caused by tobacco. The authors report in the introduction that traditional observational studies of BMI and HNC have reported different findings in smokers versus never smokers, so I would assume there is a possibility that the BMI instrument could have different associations with tumors of the tobacco-induced phenotype and tumors with a non-tobacco induced phenotype. I would assume that authors have access to the data on self-reported tobacco use behavior, even if they can't separate these tumors by molecular types. Stratifying their analysis by tobacco users or not might reveal different results with the BMI instrument.

      We appreciate the reviewer’s comment. We agree that it would have been interesting to present stratified analyses by smoking status along our main findings. However, we decided against this because of the risk of inducing collider bias in our MR analyses i.e., where stratifying on smoking status may induce spurious associations between the adiposity instruments and confounding factors. Multivariable MR is considered a better way of investigating the direct effects of an exposure (adiposity) on an outcome (HNC) accounting for a third variable (smoking)[14], which is why we opted for this method instead.

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

      (12) Coscia C, Gill D, Benitez R, Perez T, Malats N, Burgess S: Avoiding collider bias in Mendelian randomization when performing stratified analyses. Eur J Epidemiol 2022, 37(7):671-682.

      (13) Hamilton FW, Hughes DA, Lu T, Kutalik Z, Gkatzionis A, Tilling K, Hartwig FP, Davey Smith G: Non-linear Mendelian randomization: evaluation of effect modification in the residual and doubly-ranked methods with simulated and empirical examples. Eur J Epidemiol 2025.

      (14) Sanderson E, Davey Smith G, Windmeijer F, Bowden J: An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2019, 48(3):713-727.

    1. eLife Assessment

      This important work examines how microexons contribute to brain activity, structure, and behavior. The authors find that loss of microexon sequences generally has subtle impacts on these metrics in larval zebrafish, with few exceptions. The evidence is solid, using modern high-throughput phenotyping methodology in zebrafish. Overall, this work will be of interest to neuroscientists and generate further studies of interest to the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      Strengths:

      - The authors provide a qualitative analysis of microexon inclusion during early zebrafish development

      - The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

    3. Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with a behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      Weaknesses:

      Although the manuscript includes evidence for many mutants that microexon deletion has minimal effect on full length transcript levels, some of the microexon loss does alter transcript levels. Since the mutations usually yielded no phenotype, these effects on full-length transcripts are unlikely to be a major confound. For mircoexon mutants displaying phenotypes, future work will have to tease apart whether secondary effects on the transcripts are contributing to the phenotype.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It has been established that srrm4 mutants exhibit no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022). We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes. In contrast, srrm3 mutants have severe vision loss, early mortality, and numerous neural and behavioral phenotypes (Ciampi et al., 2022; Lopez-Blanch et al., 2024). We now point out the phenotypes of srrm3/srrm4 mutants in the manuscript.

      We chose not to generate and characterize the behavior and brain activity of srrm3/srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm3 and/or srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by Srrm4, making the characterization of srrm3/srrm4 mutants disconnected from our overarching project goal.

      We have edited the Introduction as follows to clarify our goal: “Studies of splicing regulators such as srrm4 impact the entire splicing program, making it impossible to determine the importance of individual microexons to protein function. Further, microexons could still be differentially included in a regulatory mutant via compensation by other splicing factors ...”

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We are unclear whether “microexon splicing genes” refers to the splicing regulators srrm3/srrm4, which we choose not to study in this work (see response to point #1 above), or the genes that contain microexons. The severe visual phenotypes of srrm3 mutants confounds the study of microexon splicing in this line because altered splicing levels could be due to downstream changes in this significantly different developmental context. A detailed discussion of splicing consequences on removal of microexons from microexoncontaining genes is in the response to point #4 below.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      In the first version of the manuscript, we had included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 (now Figures S3 and S4) of this manuscript. Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) described mild behavioral phenotypes for a microexon removal for kif1b, and we showed developmental abnormalities for the kif1b loss-of-function allele (now Figure S3). We have now added a predicted protein-truncating allele for ppp6r3. This new line has phenotypes that are similar but slightly stronger in brain activity and structure than the mutant that lacks only the microexon. The prior Figure S1 (now Figures S3 and S4) was only briefly mentioned in the first version of the manuscript, and we now clarify this point in the Results: “Protein-truncating mutations in eleven additional genes that contain microexons revealed developmental and neural phenotypes in zebrafish (Figure S3, Figure S4), indicating that the genes themselves are involved in biologically relevant pathways. Three of these genes– tenm4, sptan1, and ppp6r3 – are also in our microexon line collection.”

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We have modified our manuscript to include a discussion of both loss-of-function zebrafish and mouse mutants. See the response to below point #4.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      We acknowledge that unexpected changes to the mRNA of the tested mutants could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we have clarified our generated mutant alleles by adding a figure (Figure S1) that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we have experimentally determined whether the mRNA was modified as expected for a subset of mutants with phenotypes. In all eight tested lines (Figure S2), the microexon was precisely eliminated without causing any other effects on the sequence of the transcript in the neighboring region. We did, however, observe an effect on transcript abundance for one homozygous mutant (vav2). It is possible that complex forms of genetic regulation are occurring that are not induced by unexpected isoforms or premature stop codons. Interestingly, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) eliminated a different microexon in vav2 and also observed a subtle well center preference. If their allele from an entirely different intronic region also results in transcript downregulation, it would support the hypothesis of genetic compensation through atypical pathways. If not, it is likely this phenotype is due specifically to removal of the microexon protein sequence. Not all mutants with phenotypes could be assessed with qRT-PCR because some were no longer present in the lab. All lines were generated in a similar way, however, removing both the microexon and neighboring regulatory elements while avoiding the neighboring exons. Accordingly, we now also explicitly point out those where the clean loss of the microexon was confirmed (eif4g3b, ppp6r3, sptan1, vti1a, meaf6, nrxn1a, tenm3) and those with possibly interesting phenotypes that were not confirmed (ptprd-1, ptprd-2, rapgef2, dctn4, dop1a, mapk8ip3).

      Third, we have further emphasized in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous (Thyme et al., 2019; Capps et al., 2024) and unpublished ongoing work. We showed data for one mutant, tcf7l2, which we consider to have moderately strong neural phenotypes, and we have extended this comparison in the revision (new Figure 3G). Additionally, loss-of-function alleles for some microexoncontaining genes have strong developmental phenotypes, as we showed in Figure S1 (now Figures S3 and S4) of this manuscript in addition to our published work (Thyme et al., 2019; Capps et al., 2024). It is known from the literature that the loss-of-function mutants for mapk8ip3 are stronger than we observed here (Tuttle., et al., 2019), suggesting that only the microexon is removed in our line. The microexons in Ptprd are also well-studied in mice, and we expect that only the microexon was removed in our lines. Both Dctn4 and Rapgef2 are completely lethal prior to weaning in mice (the International Mouse Phenotyping Consortium).

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by Srrm4. We have clarified the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as Srrm3 (Ciampi et al., 2022). It is wellknown that loss of srrm3 has a stronger impact on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators but instead to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 4.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to points 3 and 4.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details). Our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by LopezBlanch et al. (https://doi.org/10.1101/2024.10.23.619860). We have also already compared the microexon removal to a loss-of-function mutant for two lines (Figures S3 and S4), and we have made this comparison more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 response to reviewer 1).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a proteintruncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      We now address the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements. We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details).

      Reviewer #1 (Recommendations for the authors):

      (1) For most ME mutations, 4 guide sequences are provided. More description / a diagram could be helpful to interpret how ME mutations were generated.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We have also added the following point to the text: “Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1).”

      (2) Figure 1 indicates that there are 45 microexons (MEs) but the text initially indicates that there are 44 that exist in a canonical layout (the text later indicates there are 45). This could be made more clear.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons  (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure 1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      (3) The description of the "canonical layout" as containing TGC / UC repeats could be rewritten as either "containing a UGC motif and UC repeats" or "containing a TGC motif and TC repeats."

      This error has been corrected.

      (4) Why was tcf7l2 selected as a control for MAP mapping?

      The mutant for tcf7l2 is an example of a moderately strong phenotype from a recent study we completed (Capps et al., 2025). This mutant was selected because it has both increased and decreased activity and structure and is ideal for setting the range of the graph. We now include a comparison to additional mutants from this study of autism genes (Capps et al., 2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). We also include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (5) What does it mean that of the remaining microexons, most are similar to canonical layout?

      Typically, they would have one of the two regulatory elements instead of both or the location of the possible elements would be slightly farther away than expected. We have clarified this point in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat  or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.”

      (6) Figure 2A is very difficult to see - most are either up or down - suggest splitting into 2 figures - one = heat map, second can summarize values that were both up and down.

      We prefer to retain this information for accuracy. The bubble location is offset to effectively share the box between the orange (decreased) and purple (increased) measures. For example, and as noted in the methods and now expanded upon, a measure can change between 4 and 6 dpf or a measure such as bout velocity could be increased while the distance traveled is decreased (both are magnitude measures). The offset of the bubbles is consistently 0.2 data units in x and y from the center of the box.

      (7) The authors apply rigorous approaches to testing the importance of microexons. I especially appreciate the inclusion of separate biological replicates in the main figures!

      We thank the reviewer for their positive feedback.

      (8) Page 5 line 5 - suggest "compared to homozygous mutants".

      The change has been made.

      (9) For Eif5g3b dark flash phenotype, it's not clear what "p-values are not calculated for response plots" means. A p-value is provided in the plot for ppp6r3 response freq.

      The eif4g3b plot is the actual response trace measuring through pixel changes whereas the ppp6r3 is the frequency of response. While informative, the response plot is time-based data with a wide dynamic range, making the average signal across the entire time window meaningless. We include the p-values for a related measure, the latency for the first 10 dark flashes in block 1 (day6dpfdf1a_responselatency) in the legend.

      (10) The ptprd phenotype in 2D is not described in the text.

      The change has been made.

      (11) Page 7 line 7: "mild" is repeated.

      This error has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Specific points for needed improvement:

      (1) The title should be adjusted to more accurately describe the results. The term 'minimal' is under-representing the findings. 9/45 (20%) of targets in their screen have some phenotype, indicating that a significant number have indeed an important function. Moreover, the phenotypic analysis is limited, leaving space for missed abnormalities (as discussed by the authors). I would therefore suggest a more neutral title such as 'Systematic genetic deletion of microexons uncovers their roles in zebrafish brain development and larval behaviour'.

      While some microexon mutants do have repeatable phenotypes, these phenotypes are far milder than phenotypes observed in other mutant sets. We now include a comparison to additional mutants from this study of autism genes (Capps et al.,2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). The title states that these microexons have a minimal impact on larval zebrafish brain morphology and function, leaving room for the possibility of adult phenotypes. Thus, we prefer to retain this title.

      (2) Do the 45 chosen microexons correspond to the 44 with a canonical layout with TGC and UC repeats? If so, it needs to be explicitly stated in the text that exons were chosen for mutation based on the potential for SRRM4 regulation. If not, then the rationale for the choice of the 45 mutants from the 95 highly conserved events needs to be explained further.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure   1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      There was no clear rationale for those that were selected. We attempted to generate all 95 and some mutants were not successfully generated in our initial attempt. As we found minimal phenotypes, we elected to not continue to make the remaining ones on the list.

      (3) More detail regarding the design of guides for CRISPR is required in the text in the methods section. From Table S2, 4 guides were used per microexon. Were these designed to flank the microexon? How far into the intronic sequence were the guides designed? Were the splicing regulatory sequences (polypyrimidine tract, branchpoint) also removed? The flanking sequences of each of the 45 deletion lines need to be provided.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Following on from the previous point, to ascertain that the phenotype observed is truly due to lack of microexon (rather than other event linked to removed intronic sequences) - for the 7 exons newly identified as functionally important, at least one added deletion line has to be shown, presenting the same phenotype. If making 7 more lines can't be achieved in a reasonable time (we are aware this is a big ask), a MO experiment blocking microexon splicing needs to be provided (may not be ideal for analysis at 6 dpf). For the existing mutants and the new ones (or morphants), sequencing of the mRNAs for the 7 genes in mutants and siblings also needs to be added to check any possible change in other variants.

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA. We acknowledge that we inadequately described the generation of these alleles, and we now provide Figure S1 to show the microexon’s relationship to possible regulatory elements that impact splicing in unexpected ways if they remain.

      We now acknowledge the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Given the caveats of MOs and transient microinjection for the study of 6 dpf phenotypes, we disagree that this suggested experiment would provide value. The phenotypic assays we use are highly sensitive, and we would not even trust CRISPANTs to yield reliable data. We have added an additional loss-of-function allele for ppp6r3 from the Sanger knockout project, which has a similar but stronger size change to the ppp6r3 microexon-removal line. In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      To support that these we generated clean removal of these microexons, we experimentally determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point #4 public response to Reviewer 1). We also have already compared the microexon removal to a loss-offunction mutant for two lines (Figure S1), and we have made that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 public response to Reviewer 1).

      (5) Figure 3: An image of control tcf7l2 mutant brain activity as a reference should be included.

      We now include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (6) Figure 3a/b. The gene names on the y-axis of the pERK and structure comparisons should be reordered to be alphabetical so that phenotypes can be compared by the reader for the same microexon across the two assays.

      These data are clustered so that any similarities between maps can be recognized. We prefer to retain the clustering to compare lines to each other.

      (7) Figure S6 legend. Including graph titles like "day3msdf_dpix_numberofbouts_60" is not comprehensible to the reader so should be replaced with more descriptive text. As should jargon such as "combo plot" and"habituation_day5dpfhab1post_responsefrequency_1_a1f1000d5p" etc.

      The legend has been edited to describe the experiments. Subsections of the prior names are maintained in parentheses to enable the reader to connect the plots in this figure to the specific image and underlying data in Zenodo.

      (8) Page 2 line 21 "to enable proper".

      The change has been made.

      (9) Page 7 line 7. Repeatable phenotypes were mild mild.

      This error has been corrected.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1B is confusingly laid out.

      We are unclear how to modify Figure 1B, as it is a bar plot. We have modified several figures to improve clarity.

      (2) Figure 1E-there are some pictures of zebrafish but to what end? They aren't labelled. The dark "no expression" looks really similar to the dark green, "high expression".

      The zebrafish images represent the ages assessed for microexon inclusion. We have added labels to clarify this point.

      (3) The main text says "microexons were removed by Crispr" but there is no detail in the main text about this at all-- and barely any in the methods. What does it mean to be removed? Cleanly? Or including part of the introns on either side? Etc. How selected, raised, etc? I can glean some of this from the Table S2 if I do a lot of extra work, but at least some notes about this would be important.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Figure 2 - There are no Ns, at least for the plots on the right. The reader shouldn't have to dig deep in Table S2 to find that. It is also unclear why heterozygous fish are not included in these analyses, since there are sibling data for all. Removed for readability of the plots might be warranted, but this should be made explicitly clear.

      The Ns for these plots have been added to the legend. The legend was also modified as follows: “Comparisons to the heterozygous larvae are removed for clarity and available in the Supplementary Materials, as they often have even milder phenotypes than homozygous.”

      (5) Needed data: for those with phenotypes, some evidence should be presented that the full-length transcripts that encode proteins without the microexons are still expressed at the same level and without splicing errors/NMD. Otherwise, some of these phenotypes that were found could be due to knockdown or LOF (or I suppose even overexpression) of the targeted gene.

      We have added a new Supplementary Figure S2 confirming clean removal of the microexons with RT-PCR for a subset of mutants with phenotypes. This figure also includes qRT-PCR for the same subset. We now discuss these findings: Results: “For eight mutant lines, we confirmed that the microexon was eliminated from the transcripts as expected (Figure S2). Although our genomic deletion did not yield unexpected isoforms, qRT-PCR on these eight lines revealed significant downregulation for the homozygous vav2 mutant (Figure S2), indicating possibly complex genetic regulation.”

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexcitation of LC neurons after chronic stress is compelling. The topic is timely, the proposed mechanistic pathway is innovative, and the findings have translational relevance, particularly regarding therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

    2. Reviewer #1 (Public review):

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions.

      First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence.

      Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces degeneration of locus coeruleus (LC) neurons. The authors demonstrate that chronic stress leads to the internalization of α2A-adrenergic receptors (α2A-ARs) on LC neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, activation of asparagine endopeptidase (AEP), and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are well-supported mainly by the data, but some aspects of image acquisition require further examination.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK-1, using a range of pharmacological agents, highlighting the innovative nature of the work. Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly in relation to therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Comments on revisions:

      The authors have addressed all of the reviewers' comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive dataset showing that repeated excitation or restraint stress internalises somatodendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting, and immunohistochemistry. The final schematic is appealing and, in principle, could explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      - Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      -Use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative

      -Well-executed electrophysiology

      -translation relevance

      -converges to a model that peers discussed (scientists can only discuss models - not data!)

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public review)

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions. 

      (1) First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence. 

      Although the reviewer #1 commented that “The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence”, we believe that this comment may be unfair. 

      It may be unfair for the reviewer #1 to neglect our responses to the original reviewer comments regarding the direct measurement of cytosolic NA levels. It is true that none of the recommended methods to directly measure cytosolic NA levels are not feasible as described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Recommendations for the authors (2)). To measure extracellular NA with GRAB-NE photometry, α2A-ARs must be expressed in the cell membrane. GRAB-NE photometry is not applicable unless α2A-ARs are expressed, whereas increases in cytosolic NA levels are caused by internalization of α2A-ARs in our study.

      In our study, we elaborated to detect the change in MAO-A protein with Western blot method, instead of examining MAO-A enzymatic activity. Because the relative quantification of active AEP and Tau N368 proteins by Western blot analysis should accurately reflect the change in the MAO-A enzymatic activity, enzymatic assay may not be necessarily required while we admit the necessity of enzymatic assay to better demonstrate the MAO-A activities as discussed in the previously revised manuscript (R1, page 10, lines 314-315). 

      We used the phrase “beyond the scope of the current study” for “the mechanism how Ca<sup>2+</sup> activates MAO-A” as described in the original authors’ responses (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (3)). We do not think that this mechanism must be investigated in the present study because the Ca<sup>2+</sup> dependent nature of MAO-A activity is already known (Cao et al., 2007). 

      On the other hand, because it is not possible to measure cytosolic NA levels with currently available methods, the quantification of the connection between α2A-AR internalization and increased cytosolic NA levels must be considered outside the scope of the study. However, our study demonstrated the qualitative relationship between α2A-AR internalization and active-AEP/TauN-368 reflecting increased cytosolic NA levels, leaving “a small gap in the mechanistic chain of evidence.” Therefore, it may be unreasonable to criticize our study as “leaving a significant gap in the mechanistic chain of evidence” with the phrase “beyond the scope of the current study.” 

      (2) Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      As described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (4)), we had already done another behavioral test using elevated plus maze (EPM) test. By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests showed that chronic RS mice displayed both anxiety-like and memory impairment-like behaviors. Accordingly, we have softened the implication of anxiety and memory impairment (page 13, lines 396-399) and revised the abstract (page 2, line 59) in the revised manuscript (R2).  

      (3) Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Because the quantification of MAO-A expression can be performed with greater accuracy by means of Western blot than by immunohistochemistry, we have moved the immunohistochemical results (shown in Figure 5) to the supplemental data (Figure S8) following the suggestion made by the Reviewer #3. As the relative quantification of active AEP and Tau N368 proteins by Western blot analysis may accurately reflect changes in the MAO-A enzymatic activity which is consistent with the result of Western blot analysis of MAO-A, enzymatic assay or re-staining of immunofluorescence for MAO-A may not be necessarily required. We do not think that a new experiment of Western blot analysis is necessary to re-evaluate MAO-A just because of the lack of the less-reliable quantification of immunohistochemical staining.

      (4) Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      The reviewer #3 is misunderstanding Figure S7. In Figure S7, there are two types of α2A-AR expressing neurons; one is TH-positive LC neuron and the other is TH-negative neuron in mesencephalic trigeminal nucleus (MTN). This clearly indicates that TH staining is specific. Furthermore, α2A-AR staining was much more extensive in MTN neurons than in LC neurons. Thus, α2A-AR signal is not similar to TH signal and there are no labeling errors, which is also evident in the merged image (Figure S7C).

      (5) Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

      Overall, the reviewer #1 was not satisfied with our revision regardless of the authors’ responses. As detailed above in our responses to the replies (1)~(4), we believe that in the original authors’ responses and in the above-described responses we effectively responded to the criticisms by the reviewer #1.

      Reviewer #2 (Public review): 

      Comments on revisions: 

      The authors have addressed all of the reviewers' comments.

      We appreciate constructive and helpful comments made by the reviewer #2.

      Reviewer #3 (Public review): 

      Weaknesses:  

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway  

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study. 

      Authors response: It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      The core idea behind my comment, as well as that of Reviewer 1, was to encourage integrating your individual findings into a more cohesive in vivo experiment. Using GRAB-NE to measure extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately, cytosolic NA levels. Connecting these experiments would significantly strengthen the manuscript and enhance its overall impact. 

      It may be true that the measurement of extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately cytosolic NA levels. However, the reviewer #3 is still misunderstanding the applicability of GRAB-NE method to detect NE in our study. As described in the original authors’ response, there appeared to be no fluorescence probe to label cytosolic NA at present. Especially, the GRAB-NE method recommended by the reviewers #1 and #3 is limited to detect NA only when α2A-AR is expressed in the cell membrane.Therefore, when increases in cytosolic NA levels are caused by internalization of α2A-ARs, NA measurement with GRAB-NE photometry is not applicable.

      (2) Pharmacology and NE concentration  

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects. 

      Authors response: It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      While the milestone papers by Williams remain highly influential, they should be re-evaluated in light of more recent findings, given that they date back over 40 years. Advances in our understanding now allow for a more nuanced interpretation of some of their results. For example, see McKinney et al. (eLife, 2023). This study demonstrates that presynaptic β-adrenergic receptors-particularly β2-can enhance neuronal excitability via autocrine mechanisms. This suggests that your post-activation experiments using atipamezole may not fully exclude a contribution of β-adrenergic signaling. Such a role might become apparent when conducting more detailed titration experiments.

      The reviewer #3 may be misunderstanding the report by McKinney et al. (eLife, 2013). This paper did not demonstrate that presynaptic β-adrenergic receptors-particularly β2- can enhance neuronal excitability via autocrine mechanisms. It is impossible for LC neurons to increase their excitability by activating β-adrenergic receptors, as we have clearly shown that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole. Considering the difference in Ki values of atipamezole for α2-AR (= 2~4 nM) (Vacher et al., 2010, J Med Chem) and β-AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), such a complete antagonization (of 100 µM NA-induced GIRK-I) by 10 µM atipamezole really reflect α2A-AR activity but not β-AR activity (Figure S5). Furthermore, it is already well established that NA-induced GIRK-I was mediated by α2-AR activity in LC neurons (Arima et al., 1998, J Physiol). McKinney et al. (eLife, 2023) have just found the absence of lateral inhibition on adjacent LC neurons by NA autocrine caused respective spike activity. This has nothing to do with autoinhibition.

      (4) Age mismatch and disease claims 

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope. 

      Authors response: As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (R1, page 14, lines 448-450).

      It would be great to see this experiment performed in aged mice-you are the one who has everything in place to do it right now! 

      In our future separate studies, we would like to prove that the present mechanism is valid in aged mice, to validate its involvement in the pathogenesis of AD. This is partly because the patch-clamp study in aged mice is extremely difficult and takes much time.

      Authors response: In the abstract, you suggest that internalization of α2A-adrenergic receptors could represent a therapeutic target for Alzheimer's disease. "...Thus, it is likely that internalization of α2A-AR increased cytosolic NA, as reflected in AEP increases, by facilitating reuptake of autocrine-released NA. The suppression of α2A-AR internalization may have a translational potential for AD treatment."

      α2A-AR internalization was involved in the degeneration of LC neurons. Because we confirmed that spike-frequency adaptation reflecting α2A-AR-mediated autoinhibition can be induced in adult mice as prominently as in juvenile mice (Figure S10), it is not inadequate to suggest that the suppression of α2A-AR internalization may have a translational potential for anxiety/AD treatment (see Discussion; R2, page 14, lines 445-449).

      (6) Quantitative histology  

      Figure 5 presents attractive images, but no numerical analysis is provided. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots. 

      Author response: We have moved the immunohistochemical results in Fig. 5 to the supplement, as we believe the quantification of immunohistochemical staining is not necessarily correct.   

      What do you mean by that " ...immunohistochemical staining is not necessarily correct."  

      It is evident that in terms of quantification, Western blot analysis is a more accurate method than immunohistochemical staining. In this sense, it is the contention of our study that the ROI-based fluorescence quantification of immunohistochemical staining is not necessarily an accurate or correct procedure, compared to the quantification by Western blot analysis.

    1. eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area-specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

    2. Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fan-shaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

    4. Author response:

      We thank the two reviewers for their constructive criticisms which we will address in the coming weeks, and we are confident doing so will benefit the manuscript.

      We will aim to address all comments, but there are two main areas in particular that we highlight here:

      (1)  Both reviewers make important suggestions to improve the readers’ understanding of the anatomical complexities and raw files we provide. We will generate annotated confocal stacks and simplify the nomenclature to better guide the reader through the more complex details of the anatomy of the central complex, and the neuron types we characterized more closely.

      (2)  Both reviewers also pointed to several parts of our interpretations and discussion that should be clarified. We will do so by improving the language we use at certain sections to offer more precision, and by offering alternative explanations where possible.

    1. eLife Assessment

      This study offers a valuable theoretical framework for quantifying molecular transport across interfaces between coexisting liquid phases, emphasizing interfacial resistance as a central factor governing transport kinetics. The mathematical derivations are solid. To enhance the paper's relevance and broaden its appeal, it would be helpful to clarify how the key equations connect to existing literature and to elucidate the physical mechanisms underlying scenarios that give rise to substantial interfacial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial.

      Strengths:

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible.

      Weaknesses:

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported.

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2.

    3. Reviewer #2 (Public review):

      Summary:

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies.

      Strengths:

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits.

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios.

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques.

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems.

      Weaknesses:

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability.

    4. Reviewer #3 (Public review):

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance.

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended.

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics).

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n^{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n^{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m^2).

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial. 

      Strengths: 

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible. 

      Weaknesses: 

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported. 

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent. 

      We agree and will make the overlap explicit, acknowledging priority and clarifying what is new here. The initial version of the preprint of Zhang et al. (2022) (https://www.biorxiv.org/content/10.1101/2022.03.16.484641v1) lacked the derivation (it referenced a Supplementary Note not yet available); it was added during the eLife submission. We worked from the preprint and missed this update, which we will now correct.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima. 

      We respectfully disagree with the reviewer on the physical plausibility of these scenarios there is both concrete experimental and theoretical evidence for the scenarios we discussed.

      Experimental: Strom et al. (2017) (our reference 11) describes a substantially reduced protein diffusion coefficient at an in vivo phase boundary, while Hahn et al. (2011a) and Hahn et al. (2011b) (our references 27 and 28) describe transient accumulation of molecules at a phase boundary, which they attribute to the Donnan potential, but conceivably a lowered mobility could play a role.

      Theoretical: Recent work (e.g., Majee et al. (2024)) shows that charged layers could form at phase boundaries, which could either repel or attract incoming molecules, depending on their charge, thus altering the local volume fraction, resulting in a trough or peak. Arguably, the model put forth by Zhang et al. (2024) could be mapped to a potential wall, where particles are reflected, unless in a certain state. We will add sentences to the corresponding results section, as well as the discussion to make this plausibility more apparent.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2. 

      We believe we will be able to address both issues.

      Reviewer #2 (Public review): 

      Summary: 

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies. 

      Strengths: 

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits. 

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios. 

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques. 

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems. 

      Weaknesses: 

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data. 

      We agree with the reviewer, an experimental manuscript is in progress.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      We thank the reviewer for this comment. We will add several such scenarios to the discussion, including the possibility to use interface resistance as a way of ordering biochemical reactions in time, as well as their potential to exclude molecules from condensates for long time periods, which, while not effective in the long-time limit, could help on cellular timescales of minutes to hours to respond to transient events.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability. 

      The treatment of labelled and unlabelled molecules as physically identical is well supported by our experiments. Droplets under typical experimental conditions, i.e. when bleaching is not too strong, do not markedly change size or volume fraction of molecules, which would be expected if the physical properties like molecular volume or interaction strength were significantly changed. However, we do agree that in more extreme bleaching regimes the bleach step itself will change the droplet properties, but this can be avoided by tuning the FRAP laser power and dwell times accordingly.

      Our diffusivity profiles are chosen in the simplest possible way to handle typical experimental constraints (large D outside, lower D inside, potentially lowered D at the boundary) and allow for a mean-field treatment. To the best of our knowledge, the precise make-up and concentration profiles of phase boundaries in biomolecular condensates are not currently known, due to limitations in optical resolution.

      Reviewer #3 (Public review): 

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance. 

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended. 

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics). 

      We thank the reviewer for this comment and will discuss possible mechanisms and how they map to our meanfield model in more detail, both in the corresponding results section, and in the discussion, as also outlined in our response to Reviewer #1.

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n<sup>^</sup>{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n<sup>^</sup>{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m<sup>^</sup>2). 

      We thank the reviewer. We identified that the error originates in the inline definition of the exchange chemical potential, already before equation 11. We inadvertently dropped a prefactor of n, which then shows up in the following equation as an exponent to (1-phi<sup>^</sup>*). Very importantly this means the main result eq. 25 still holds, and in the revised manuscript we will correct the ensuing typographical mistakes.

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      We will substantially expand the Appendix for the numerical solutions and add an explanatory file to the repository to make clear how the code can be run, as well as its dependencies.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability. 

      We have shown in Bo et al. (2021) that the labelling approach can be carried over to multi-component systems. Each species may, for example, encounter its own interface resistance. We will discuss this in more detail in the revised manuscript.

    1. eLife Assessment

      In this manuscript, the authors investigate the migration of human cortical interneurons under hypoxic conditions using forebrain assembloids and developing human brain tissue, and probe the underlying mechanisms. The study provides the first direct evidence that hypoxia delays interneuron migration and identifies adrenomedullin (ADM) as a potential therapeutic intervention. The findings are important, and the conclusions are convincingly supported by experimental evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

    3. Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC line,s including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths and Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

    1. eLife Assessment

      This work uses enhanced sampling molecular dynamics methods to generate potentially useful information about a conformational change (the DFG flip) that plays a key role in regulating kinase function and inhibitor binding. The focus of the work is on the mechanism of conformational change and how mutations affect the transition. The evidence supporting the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used weighted ensemble enhanced sampling molecular dynamics (MD) to test the hypothesis that a double mutant of Abl favors the DFG-in state relative to the WT and therefore causes the drug resistance to imatinib.

      Strengths:

      The authors employed the state-of-the-art weighted ensemble MD simulations with three novel progress coordinates to explore the conformational changes the DFG motif of Abl kinase. The hypothesis regarding the double mutant's drug resistance is novel.

      Weaknesses:

      The study contains many uncertain aspects. A major revision is needed to strengthen the support for the conclusions.

      (1) Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      (2) Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      (3) The DFG flip needs to be sampled several times to establish free energy difference.

      (4) The free energy plots do not appear to show an intermediate state as claimed.

      (5) The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      (6) The free energy scale (100 kT) appears to be one order of magnitude too large.

      (7) Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      (8) Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript on the mechanism of the DFG flip in kinases. This conformational change is important for the toggling of kinases between active (DFG-in) and inactive (DFG-out) states. The relative probabilities of these two states are also an important determinant of the affinity of inhibitors for a kinase. However, it is an extremely slow/rare conformational change, making it difficult to capture in simulations. The authors show that weighted ensemble simulations can capture the DFG flip and then delve into the mechanism of this conformational change and the effects of mutations.

      Strengths:

      The DFG flip is very hard to capture in simulations. Showing that this can be done with relatively little simulation by using enhanced sampling is a valuable contribution. The manuscript gives a nice description of the background for non-experts.

      Weaknesses:

      I was disappointed by the anecdotal approach to presenting the results. Molecular processes are stochastic and the authors have expertise in describing such processes. However, they chose to put most statistical analysis in the SI. The main text instead describes the order of events in single "representative" trajectories. The main text makes it sound like these were most selected as they were continuous trajectories from the weighted ensemble simulations. I would much rather hear a description of the highest probability pathway(s) with some quantification of how probable they are. That would give the reader a clear sense of how representative the events described are.

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      It would be nice to have error bars on the populations reported in Figure 3.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?

    1. eLife Assessment

      This valuable work advances our understanding of the relation between multimodal MRI, cognition, and mental health. Convincing use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

      [Editorial note: a previous version was reviewed by Biological Psychiatry]

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

    4. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewers’ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2’s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscript’s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: “Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r ≈ 0.4) [15–17, 28, 61, 67, 68].”), the medium effect size we observed (r ≈ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krämer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a “mega-analysis”. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283–308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoC’s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: “Table 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanning”

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) ≈ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) ≈ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: “To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, age×sex, and age2×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      Line 445: “Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (“All MRI Stacked”) explained 72% of this age and sex-related variance (Fig. 8i–l and Table S21).”

      Discussion

      Line 660: “We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.”

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: “Composite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (“Extremely unhappy”) and the higher score indicated a higher level of happiness (“Extremely happy”). For all questions, we assigned the median values to “Prefer not to answer” (-818 for in-person assessment and -3 for online questionnaire) and “Do not know” (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the “Work/job satisfaction” question from the mental health derivatives list because it included a “Not employed” response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (“Frequency of drinking alcohol”), we replaced missing values with 0 as they would correspond to a “Never” response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of ≥ 8 and ≥ 15, respectively. The “Alcohol dependence ever” status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].”

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816–832.  

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscript’s Discussion.

      Results

      Line 298: “On average, information about mental health predicted the g-factor at  R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.”

      Discussion

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (“… these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathways”) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: “The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117–119], the empirical evidence on their associations with cognitive performance is controversial [114, 120–126].”

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimer’s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757–2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. It’s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance – as measured by various cognitive tasks – should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= –0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32].  

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, “why is it then necessary to explain this overlap with neuroimaging measures”, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association – insights that cannot be obtained from behavioural data alone.

      In response to the related question, “Could the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?”, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction – to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Introduction

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      Discussion

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (“marker”) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator – its capacity to capture individual differences in cognition – is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the “Prospective Memory” test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures – numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the “Core measures” column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: “We used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].”

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correct… It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: “Mental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSR’s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variables”

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8–23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164–177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smaller… What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain – mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearson’s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1) Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3) Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: “To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r ≈ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520–5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.”

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2–S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: “To determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, age×sex, and age<sup>2</sup>×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      dwMRI

      Line 331: “Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).”

      rsMRI

      Line 353: “Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)”

      sMRI

      Line 373: “FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).”

      Discussion

      dwMRI

      Line 562: “Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.”

      rsMRI

      Line 583: “We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127–130].”

      sMRI

      Line 608: “Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138–140].”

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. eLife Assessment

      This valuable paper investigates how fish avoid thermal disturbances that occur on fast timescales. The authors use a creative experimental approach that quickly creates a vertical thermal interface, which they combine with careful behavioral analyses. The evidence supporting their results is solid, but there is a potential confounding factor between temperature and vertical positioning, and characterization of the thermal interface would greatly assist in interpreting the results.

    2. Reviewer #1 (Public review):

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations.

    3. Reviewer #2 (Public review):

      The paper by Naudascher et al., investigates an interesting question: How do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales. Previous work has identified potential strategies of warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While direct measurements of the interface are lacking, thermal dynamics simulations suggest that trout parr avoid the warm-cold interface in the absence of gradient information.

      The authors assume that the thermal interface triggers the upward turning behavior, possibly leading to the formation of an associative memory. However, an alternative explanation is that exposure to cold water during initial excursions increases the tendency for upward turns. In other words, exposure to a cold interface changes the behavioral state leading to increases in gravity controlled upward turning. This could be an adaptive strategy since for temperatures > 4C swimming upwards is a good strategy to reach warmer water. That being said, the vertical design offers new insight and is ecologically relevant.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.

      We highly appreciate this evaluation and have addressed the reviewer’s specific comments below.

      The sentence "Further, the metabolic performance (and thus functions including growth, reproduction, and locomotion) of ectotherms takes the form of a bell-shaped curve as a function of temperature6, peaking within a range of optimal temperatures (the 'preferendum') and going to zero at lower and upper temperature limits7." contains several over-simplifications and misconceptions:

      (1) Thermal performance curves are never bell-shaped.

      (2) The optimum for various traits often shows different TPCs.

      (3) The preferendum rarely lines up with the thermal optimum for various trait TPCs.

      (4) Performance for various traits rarely reaches zero at upper or lower limits, instead they can reach zero at less extreme temperatures (e.g. growth) or maintain high function all the way up to and sometimes beyond thermal limits (e.g. aerobic scope, heart rate).

      We highly appreciate this input. We have replaced that sentence with: L69-71: “Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance (Jutfelt et al. 2024).”

      The use of adaptation instead of acclimation is confusing. Adaptation should be reserved for evolutionary change. This is an issue in several parts of the manuscript.

      Thanks for this input, we have replaced the word adapt with acclimate in two instances: L79 and L398.

      It is not true that "very few quantitative studies of thermotaxis have been conducted in fish". There exists an extensive literature on thermal preference and avoidance in fish that the manuscript downplays.

      Thanks a lot for this input. We understand that thermal preference is ultimately driven by mechanistic responses to thermal gradients, and that thermotaxis and thermokinesis are the two mechanisms used by fish to navigate heterothermal environments. Our study and analysis are focused on understanding these mechanisms in vertically stratified conditions, not to understand thermal preferences per se. We have modified our text to clarify this aspect. Our literature review was focused on the behavioral mechanisms and our understanding is that the establishment of thermal preferences has a different goal compared to understanding how fish respond to rapid changes in water temperature. We have deleted that sentence and replaced it by (L107-110): “While the thermal preference of fish is a well-established field of research, very few quantitative studies of the behavioral mechanisms allowing fish to seek their preferendum (i.e. thermotaxis) have been conducted in fish.”

      (Methods) It is unclear why the blue dye was used in all experiments. The fish can see the differently coloured water layer and that may have affected their choices. Five control trials without dye were run but finding no difference there could also be due to low statistical power.

      We appreciate this comment. The blue dye was used to visualize the precise location of the thermal interface and was therefore necessary in all experiments (see Methods section ‘Visualization and evolution of the thermal interface’). We acknowledge that fish can perceive the colored water layer, but since the dye concentration and resulting color intensity were consistent across all treatments, we do not see how it could have acted as a confounding variable. While we recognize the possibility of some behavioral influence from the dye, the clear behavioral differences across treatments indicate that it was not a determining factor. To emphasize this we have added the following to the manuscript (L701-703): “Furthermore, because the dye concentration and resulting color intensity were consistent across all treatments, the dye did not act as a confounding variable in our statistical comparisons.”

      Regarding statistical power, our control experiment without dye (N = 16 fish, 4 replicates; see Fig. S34 and S35) provides sufficient statistical power to assess whether the dye influenced behavior. The reviewer indicated that the high statistical power was a strength of the paper, which aligns with our view that our study design enables robust statistical comparisons. It seems contradictory that statistical power is a concern for the control trials, given that our main experiments were conducted with a similar sample size. Indeed, the number of replicates used is consistent with similar studies and balances statistical rigor with the ethical goal of reducing the number of animals used in experimentation. To emphasize this, we have added the following to the manuscript (L865-868): “The number of replicates used in this study reflects a balance between statistical rigor and the ethical imperative to minimize the use of animals in experimentation. Regarding statistical power, our design (five replicates with groups of four fish each) is consistent with similar studies and represents an adequate sample size.”

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult. This issue should be thoroughly discussed.

      Thank you very much for this comment. We revised the manuscript accordingly, to clearly indicate that our goal was to assess the response of fish to vertically thermally stratified water, a scenario that occurs frequently in nature. We have added the following paragraph the discussion (L523-530): “However, a generalization of our observations to horizontally oriented thermal gradients remains elusive. Our results are inherently tied to the vertical stratification created in our experiments. As warm water was always positioned above and cold water below, we could not control for the effect of vertical position (i.e., we could not do cold over warm layer experiments). This limits our ability to directly compare our findings to those obtained from horizontally oriented thermal gradients. On the other hand, the case we addressed is of direct environmental relevance, as natural waters often experience vertical thermal stratification.”

      It is unclear why the authors assume an "optimal temperature" (undefined for which trait) of 12°C for brown trout parr, and why they assume the preference temperature would match that "optimal" temperature. The thermal biology for any fish species is more complex than a single perfect temperature, with various traits showing differing optima and often a mismatch with the preferred temperature. The literature suggests brown trout growth optima between 13 and 16°C, and preference temperature has even been suggested to be as high as 21°C. In light of this, the authors' conclusion that brown trout avoid cold and don't avoid warm water is possibly misguided. It is possible that the brown trout had a preference temperature higher than 12°C, which should be acknowledged and discussed.

      This is indeed a very important aspect, which was partly (but indeed not fully) already addressed in the discussion. To reflect these considerations, we have expanded the existing paragraph in the discussion (additions are in yellow). (L422 - L439): “We conclude from the behavior of fish when warmer water was available that their acute thermal preferendum exceeded 12 °C, departing from the acclimation temperature we had chosen based on the thermal preferendum for trout reported in literature[33]. Indeed, the thermal biology for any fish species is more complex than a single, static thermal preferendum: Many internal and external factors, such as hypoxia, satiation, time of day, and life stage[5], can influence the temperature preference of fish. For example, the level of satiation can have an impact because when fish are well fed, their growth rate increases with body temperature as metabolic performance increases[40]. This modifies the preferred temperature, as observed in Bear Lake sculpin (Cottus extensus) that ascend into warmer water after feeding to stimulate digestion and thereby achieve a three-fold higher growth rate[41]. In contrast, field studies with adult fish have observed movement from warm to cold water in summer[42,43], allowing fish to lower their metabolic rate, likely in effort to conserve energy[2,44]. We propose that the behavior of trout parr upon exposure to warmer water in our experiments served to achieve a higher body temperature to ultimately increase growth rate, which is critical for this life stage[45,46]. Indeed, growth experiments on brown trout populations have shown that optimal growth temperatures can range between 15 and 19 °C, depending on the stream of origin[46].”

      The figures are unnecessarily complex and introduce a long list of abbreviations and Greek characters for no apparent reason. There are many simpler ways for showing the results so unclear why they are so opaque.

      We appreciate the reviewer’s feedback and agree on the importance of clarity, however (in the absence of specific suggestions) we did not make changes to the figures or the use of Greek characters (which align with convention), as we believe they effectively convey the results. We highlight that the data themselves are very rich (multiple fish, multiple phases, multiple treatments, etc.) and we wanted to convey this richness in a compact and transparent manner.

      Reviewer #2:

      This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.

      A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.

      The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.

      The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.

      The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded.

      Specifically, the paper uses two thermal models which are not detailed at all in the methods section.

      We appreciate these crucial comments to our paper. We have addressed these points in detail below.

      As stated above, a characterization of the thermal interface is critical. Ideally via measurement or at least by expanding on the simulation.

      We appreciate the idea of using thermal cameras and, indeed, we had initially tried to use them. However, thermal cameras generally cannot see through plexiglass or glass-like material due to the way infrared radiation interacts with these materials. While thin plastics can transmit some infrared, thicker plastics and reflective materials like glass tend to block or reflect infrared light.

      We have attempted to better characterize the thermal interface thickness, namely the spatial extent of the thermal gradient over the time period of our experiments (20 min). Indeed, our simulations in the original SI were conducted precisely to estimate the thermal interface thickness, though based on thermal diffusion in still water, while turbulence generated by the moving gravity current can smear out the interface, particularly in the initial phase. To account for this in our in the reviewed manuscript, we adopted a phenomenological approach to estimate the initial increase in thickness of the thermal interface due to turbulence and present this refined simulation in our manuscript.

      Our analysis suggests that, rather than assuming an initial interface thickness of zero (as in the original version of the manuscript), the thermal diffusion simulations should begin with an initial thickness of 2.8 mm in TR1. To incorporate this adjustment, we set the initial interface thickness to 2.8 mm and ran the simulation forward for t = 20 min, assuming diffusion. This approach resulted in a final interface thickness ranging between 4 and 6 cm (see Fig. 29 in the Supplementary Information).

      To reflect this refinement, we have added a new paragraph (L717-758: "Characterization of the thermal gradient", to the Methods section. Additionally, we have updated Fig. S29 in the Supplementary Information and included an average (over time and across treatments) gradient thickness of 5 cm in Figs. 2 and 3 of the manuscript. The revised Figs. 2 and 3 now explicitly indicate the estimated vertical extent of the thermal gradient, with an extended caption detailing these changes.

      The simulation should be detailed in the methods so that its validity can be evaluated and ideally, it should involve curved interfaces as encountered in the experiment.

      To account for the effect of turbulence during the initial, inertia-dominated phase after the gate removal, we have provided a correction for the initial thickness of the interface (see the addition to the Methods section). Thank you for your suggestion regarding the incorporation of curved interfaces in the simulations. We believe that including curved interfaces in the simulations would not significantly affect the results. As shown in the manuscript, the interface is curved primarily during the initial phase of the process (first 2 min where the flow is inertia-dominated), which is currently not included in our data analysis (phase 1 begins 2 min after the gate removal).

      In that vein, distances from the interface rather than height above the interface should be reported for the fish.

      We acknowledge the reviewer’s suggestion to report distances from the interface rather than height above or below it. However, beyond the initial phase, we do not see a strong justification for using the orthogonal distance over the vertical distance, as the choice is inherently arbitrary (e.g., one could also measure the distance along the fish’s orientation vector). We have therefore kept our assessment based on the vertical distance.

      Absent measurements, the paragraph on associative learning should be struck from the discussion as it is purely speculative.

      We agree that the original paragraph on associative learning may have sounded overly speculative. However, after updating our manuscript with additional simulations of the thermal gradient's vertical extent, we found that fish perform upward turns not only above the thermal interface, but also before entering the thermal gradient itself. This observation makes us hesitant to attribute the response solely to thermotaxis. We believe it is essential to provide a plausible explanation—albeit speculative—for how fish initiate these turns before directly encountering the cold-water gradient. To support this, we have extended the discussion in this paragraph and added Supplementary Fig. 39. The new text now reads (additions in yellow): (L487 – 499): “Our findings show that fish were able to perform upward turns while still located above the thermal interface and that is, before actually sampling the cold water below the interface. In fact, our simulation of the vertical extent of the thermal gradient revealed that a substantial fraction of upward turns occurred before fish encountered the gradient itself — that is, prior to any sensory detection of the temperature change (Supplementary Fig. 39). This finding may be evidence of associative learning, whereby fish used information regarding the presence of colder water at depth obtained at prior times. While the current data do not provide conclusive evidence in this regard, they prompt the possibility that, rather than responding solely to immediate thermal cues, fish use spatial memory or associative learning to anticipate the location of colder water based on prior experience. Indeed, fish are able to perform associative learning based on non-visual cues[53], create mental maps of their surroundings54 and retain memory for hours[55], days[56] and months[57,58].”  

      The body-temperature simulations need to be detailed in the methods.

      Thanks for this comment. We have removed the supplementary text section and have included the paragraph “Body cooling during cold-water excursions” into the methods section of our manuscript (L804 - L829).

      Constant temperature experiments could be helpful in addressing the importance of a gradient/interface for triggering upward turning

      We agree, however, we were limited (for ethical reasons) to a maximum number of fish we could use in the experiments. Hence, we focused on getting approval to run experiments focused on the responses to thermal gradients. However, occupancy during the acclimation phase in 12 °C showed that fish were much more stationary and primarily occupied the lower half of the tank.

      A lot of ease of reading could be gained by labeling the conditions according to either the second temperature or perhaps even better the delta temperature (i.e. TR[-2C] instead of TR1).

      We agree that labeling conditions by the second temperature or delta temperature could in principle improve readability. However, since T_bottom and T_top are explicitly mentioned in each main figure at least once, they can be directly associated with the respective treatment. Therefore, we have opted to retain the current labeling for consistency.

      The figure legends are often short and do not accurately label all figure elements. This is especially true for supplemental figure legends which often appear rushed (e.g., the legend for Figure S2 stops mid-sentence, the legend of Figure S3 does not indicate what Ttop or Tbottom are).

      We appreciate the reviewer’s comment and have carefully revised all figure legends to ensure clarity and completeness. Specifically, we have corrected figure labels, expanded the descriptions for supplemental figures, and ensured that all elements are accurately defined. For instance, we have completed the legend for Figure S2 and clarified the definitions of T_top and T_bottom in Figure S3. Additionally, we have systematically reviewed all figure legends to prevent inconsistencies and omissions.

      For Figure S3, to improve clarity, plotting the standard deviation at different points in the tank across the phases could be more informative than the hard-to-distinguish multi-line plots in different shades of red.

      We appreciate the reviewer’s suggestion regarding Figure S3. However, the primary goal of this figure is to illustrate how the thermal interface moves over time. While plotting the standard deviation at different points in the tank could provide additional statistical insights, it would detract from the intended visualization of the interface dynamics. For this reason, we have opted to retain the current multi-line representation. Nevertheless, we have ensured that the figure is as clear as possible by refining the color contrast and improving the legend for better readability.

      There is an inconsistency in in-text citation styles (mixture of superscript and numbers in brackets).

      Thank you for pointing this out. We have carefully reviewed the manuscript and corrected any inconsistencies in the in-text citation style to ensure uniform formatting throughout.

      While the statement in the introduction, that increases in movement frequency could be purely metabolic in nature is correct, at least for larval zebrafish it has been shown that sensory neural activity is predictive of motor neuron activity and swim rates (Haesemeyer, 2018, cited by the authors).

      This is an interesting finding. It is however unclear to us why this information is crucial in our context of brown trout parr.

      Examples of summary results from Supplementary Figures 8-10 should be bundled in a main text figure since this appears to be important information supporting the conclusions.

      We agree that Supplementary Figures 8–10 contain important information (i.e. Boxplots) on vertical occupancy and the time individuals spent in different water temperatures. However, this information is already integrated into Figure 2C, D, F, and G, which display the vertical distributions of fish across treatments and over time. Given the current length of the manuscript, adding another main-text figure could dilute rather than enhance clarity. For this reason, we have opted to keep these details in the Supplementary Materials while ensuring they are appropriately referenced in the main text.

      The distributions of excursion length for all treatments should be graphed in a main figure to support the point made in the third paragraph of the "Trout parr... do not avoid warm water" section of the results.

      We appreciate the reviewer’s suggestion. However, we do not believe that plotting excursion length is necessary to support this statement, as the key finding is already well represented in the manuscript. Specifically, the transition to bimodal depth occupancy, with fish spending comparable time above and below the interface in warm-water treatments (TR6–TR9), is clearly conveyed in Figure 2F and Supplementary Figure 8B. Additionally, this information is explicitly stated in the results section (L235): "Fish did not avoid warmer water in any of the warm-water treatments (TR6–TR9). Instead, fish transitioned to a bimodal depth occupancy, with comparable time spent above and below the interface (Fig. 2F; Supplementary Fig. 8B)." Given this, we believe that adding an additional figure would not enhance clarity but may instead introduce redundancy.

      There should be a main figure panel that statistically compares the turn biases around the interface for the different conditions and the +/- 5cm interface line mentioned in the text should be visualized in the appropriate figures - incidentally, this length scale is on par with the diffusion seen in simulations further suggesting that fish in fact sense a gradient here rather than remembering an interface.

      To address the reviewer’s comment, we have made the following updates:

      • Extended and incorporated simulations of the thermal interface thickness (see Methods and Supplementary Fig. 29).

      • Plotted the vertical locations of up-turning events relative to the phase-averaged position of the thermal interface (see Supplementary Fig. 39), which includes the estimated 5 cm vertical extent of the thermal gradient.

      • Added the thermal interface thickness to the main figures (Fig. 3F,G and Fig. 2E,H) where applicable.

      While we do not claim that memory alone explains cold-water avoidance, our data still suggests that it may contribute to the observed behavior, particularly since a substantial number of upturns occurred before the fish entered the thermal gradient (see also Author response image 1 below). Our aim is not to statistically disentangle the relative contribution of thermotaxis versus associative learning, but to propose a plausible interpretation of this observed anticipatory behavior with due caution to clarify that this is only a possibility.

      Given that the thermal gradient is now visualized and characterized in detail, we respectfully suggest that an additional statistical comparison of turn biases would not add further clarity. We believe that is is evidence that vertical turning, away from the cold, occurred within and above the thermal gradient. However, we welcome the reviewer’s perspective and to demonstrate that turning points occur outside and above the thermal interface we have plotted them against gradient growth over time (see Author response image 1 below).

      Author response image 1.

      The colored area indicates the temporal growth of thermal interface thickness.

      Reviewer #3:

      In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.

      Minor comments

      P2 intro paragraphs 1/3 - it is not clear that thermal preference generally reflects the thermal optimum, partly because it is not clear what trait is being optimized (fitness?). Some nuance here would be helpful, and would also link nicely to the discussion on p10.

      Thank you for this comment. We have now refined this section as follows (L67–71): "As most fish species are ectotherms, their body temperature fluctuates with the surrounding water temperature. Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance[6]."

      To further clarify how thermal preference relates to thermal optimum and what trait is being optimized, we have incorporated additional nuance in this section. Specifically, we now acknowledge that thermal preference may not always align with the thermal optimum for performance or fitness.

      P2 intro paragraph 2 - "adapt physiologically" implies evolution, but here you are referring to plasticity. Suggest saving the word "adapt/adaptation" for evolutionary changes (see also p9).

      Thank you for this comment. We have revised the wording to "acclimate physiologically" (L79) to more accurately reflect plastic responses rather than evolutionary adaptation.

      P7 - "This difference in probabilities (ρup - ρdown) was particularly large in the region immediately above and below the interface (-5 cm < D < 5 cm; Fig. 3F) and is a hallmark of a thermotactic behavior." I agree that the result provides compelling evidence for thermotaxis, but would it be possible to bolster this case by statistically testing for a difference in probabilities among the treatment groups here?

      In addition to Fig. 3F, we are presenting statistical evidence that for colder water temperatures, fish penetrate less deeply into the cold lower water. The decreasing trend was statistically significant (Mann–Kendall test: , p < 0.001; Supplementary Table 6) and is presented in Fig. 4C. The depth reached during each cold-water excursion is determined by the location of the vertical turning point, which redirects the fish upward toward the surface. We think this is sufficient evidence for thermotaxis.

      P9 paragraph 3 = "recent studies suggest that fish may instead respond to temporal changes of their internal body temperature." It seems like a citation is missing here. Would be useful to briefly summarize the evidence for internal temperature sensing that is the basis of this modelling exercise.

      Thanks, we have added that citation (L385).

      P10 "Our findings provide the first experimental evidence for this mode of behavioral thermoregulation in which fish navigate their heterothermal environment to achieve gradual body warming."

      I think this statement overreaches given the presented data. While there may be a trend towards fish in the warm treatment spending increasing amounts of time in the upper half of the tank, I do not see this pattern supported statistically. There is also no evidence of gradual body warming, and even if there was I disagree that this would constitute experimental evidence that this was happening "intentionally". By this reasoning, any shuttlebox experiment in which fish actively shuttle between relatively warm and cool sides to end up with a preference that is above the starting condition would also constitute evidence for gradual warming. Overall, this is an interesting pattern, but I do not think there is sufficient evidence to conclude that fish are strategically warming.

      We appreciate the reviewer’s comment and acknowledge that our original wording may have overstated the evidence. We have revised the sentence to better reflect the evdience presented (L411-415): “Our observations resemble this mode of behavioral thermoregulation, in which fish progressively favor warmer regions within a heterothermal environment. However, additional experimental evidence is required to determine the mechanisms underlying this behavior.”

      P11 "Despite the avoidance response of cold water, fish engaged in repeated cold-water excursions..."

      This is an interesting speculation, but I think it would be helpful to also point out that these fish are biased towards the bottom of the tank (based on control measurements) and this pattern may therefore simply reflect a desire to be lower in the water column.

      Thank you for this helpful comment. We have now added this point to the revised text, which reads (L475-477): “Despite the avoidance response to cold water, fish engaged in repeated cold-water excursions, potentially reflecting a behavioral strategy to map the thermal environment. This pattern may also reflect an inherent tendency to occupy the lower part of the tank, as observed during homogeneous temperature of 12 °C during the acclimation phase.”

      P13 - why was the dye always added to the right side of the tank, instead of being assigned to a side randomly? I think the control experiment is good evidence that the dye did not substantially affect behaviour, but it seems like it would have been nice to separate dye and novel temperature exposure.

      We agree that randomizing the side of dye application would have been ideal. The dye was consistently added to the right side to maintain procedural consistency, ensuring that the “incoming” or “novel” temperature was always dyed. That said, our control experiment provides strong evidence that the dye itself did not influence behavior (as discussed above and in the manuscript).

    1. eLife Assessment

      This important study uses the delay line axon model in the chick brainstem auditory circuit to examine the interactions between oligodendrocytes and axons in the formation of internodal distances. This is a significant and actively studied topic, and the authors have used this preparation to support the hypothesis that regional heterogeneity in oligodendrocytes underlies the observed variation in internodal length. In a solid series of experiments, the authors have used enhanced tetanus neurotoxin light chains, a genetically encoded silencing tool, to inhibit vesicular release from axons and support the hypothesis that regional heterogeneity among oligodendrocytes may underlie the biased nodal spacing pattern in the sound localization circuit.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #2 (Public review):

      Summary:

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major comments:

      (1) The authors should test the efficiency of TeNT to validate that vesicular release is indeed inhibited from expressing neurons. Additionally, the authors should clarify if their TeNT expression system results in the whole tract being silenced, or results in sparse vesicular release inhibition in only a few neurons.

      (2) The authors should revise their statistical analyses throughout, and supply additional information to explain the rationale for the statistical tests used, including e.g. data normality, paired sampling, number of samples/independent biological replicates.

      (3) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the avian auditory circuit?

      (4) The study shows a correlation between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). The authors should either include such experiments, or discuss their value in supporting the interpretation of their results.

      (5) The authors should discuss very pertinent prior studies, in particular to contextualize their findings with (a) known neuron-autonomous modes of node formation prior to myelination, (b) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, (c) known correlation of myelin length and thickness with axonal diameter, (d) regional heterogeneity in the oligodendrocyte transcriptome.

      Significance:

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing.

      Major comments:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      Significance:

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

      Comments on revised version:

      This revised version is in large improved and the responses to reviewers' comments are generally relevant. However, the response regarding pre-nodes is not satisfactory. I understand that the authors prefer to avoid further experimentations, but I think this is an important point that needs to be clarified. Exploring stages between E12 and E15 are therefore of importance. When carefully examining some of the figures (Fig. 1E or 2D) I think that at E15 they may well be pre-nodes formation prior to myelin deposition, on structure the authors considered to be heminodes. To be convincing they should use double or triple labeling with, in addition to the nodal proteins (ankG and/or Nav pan), a good myelin marker such as antiPLP. The rat monoclonal developed by late Pr Ikenaka would give a sharper staining than the anti MAG they used. (I assume the clone must still be available in Okazaki ).

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Evidence, reproducibility and clarity

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      Thanks for your suggestion to avoid confusion. We used the phrase "nodal spacing" instead of "nodal distribution" throughout the revised manuscript.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      As a key distinction, our study focuses specifically on the main trunk of the contralateral projection of NM axons. This projection features a sequential branching structure known as the delay line, where collateral branches form terminal arbors and connect to the ventral dendritic layer of NL neurons. This structural organization plays a critical role in influencing the dynamic range of ITD detection by regulating conduction delays along the NM axon trunk.

      The study by Seidl et al. (2010) is a pioneering work that measured diameter of NM axon using electron microscopy, providing highly reliable data. However, due to the technical  limitations of electron microscopy, which does not allow for the continuous tracing of individual axons, it is not entirely clear whether the axons measured in the ventral NL region correspond to terminal arbors of collateral branches or the main trunk of NM axons (see Figure 9E, F in their paper). Instead, they categorized axon diameters based on their distance from NL cell layer, showing that axon diameter increases distally (see Figure 9G in their paper). Notably, the diameters of ventral axons located more than 120 μm away from the NL cell layer is almost identical to those in the midline.

      As illustrated in our Figure 4D and Supplementary Video 2, the main trunk of the contralateral NM projection is predominantly located in these distal regions. Therefore, our findings complement those of Seidl et al. (2010) rather than contradicting them. We made this point as clear as possible in text (page 7, line 3).

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      In this study, we examined chick embryos from E9 to just before hatching (E21) and post-hatch chicks up to P9. Chickens begin to perceive sound around E12 and possess sound localization abilities at the time of hatching (Grier et al., 1967) (added to page 4, line 9). Therefore, by E21, the sound localization circuit is largely established.

      On the other hand, additional refinement of the circuit with aging is certainly possible. A key cue for sound localization, interaural time difference (ITD), depends on the distance between the two ears, which increases as the animal grows. As shown in Figure 2G, internodal length increased by approximately 20% between E18 and P9 while maintaining regional differences. Given that NM axons are nearly fully myelinated by E21 (Figure 4D, 6C), this suggests that myelin extends in proportion to the overall growth of the head and brain volume. We described this possibility in text (page 5, line 21)

      Thus, our study covers not only the early stages of myelination but also the post-functional maturation in the sound localization circuit.

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?

      In this study, we demonstrated that although vesicular release did not affect internodal length, it selectively promoted oligodendrogenesis, thereby supporting the full myelination and hence the pattern of nodal spacing along the NM axons. We believe that this finding falls within the broader scope of 'activity-dependent plasticity' involving oligodendrocytes and nodes.

      As summarized in the excellent review by Bonetto et al. (2021), activity-dependent plasticity in oligodendrocytes encompasses a wide range of phenomena, not limited to changes in internodal length but also including oligodendrogenesis. Moreover, the effects of neuronal activity are not uniform but likely depend on the diversity of both neurons and oligodendrocytes. For example, in the mouse visual cortex, activity-dependent myelination occurs in interneurons but not in excitatory neurons (Yang et al., 2020). Additionally, expression of TeNT in axons affected myelination heterogeneously in zebrafish; some axons were impaired in myelination and the others were not affected at all (Koudelka et al., 2016). In the mouse corpus callosum, neuronal activity influences oligodendrogenesis, which in turn facilitates adaptive myelination (Gibson et al., 2014).

      Thus, rather than refuting the role of activity-dependent plasticity in nodal spacing, our findings emphasize the diversity of underlying regulatory mechanisms. We described these explicitly in text (page 10, line 18).

      Significance

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

      This paper does not argue against node plasticity, but rather demonstrates that oligodendrocytes in the NL region exhibit a form of plasticity; they proliferate in response to vesicular release from NM axons, yet do not undergo morphological changes, ensuring adequate oligodendrocyte density for the full myelination of the auditory circuit. Thus, activity-dependent plasticity involving oligodendrocytes would contributes in various ways to each neural circuit, which is presumably attributed to the fact that myelination is driven by complex multicellular interactions between diverse axons and oligodendrocytes. Oligodendrocytes are known to exhibit heterogeneity in morphology, function, responsiveness, and gene profiles (Foerster et al., 2019; Sherafat et al., 2021; Osanai et al., 2022; Valihrach et al., 2022), but functional significance of this heterogeneity remains largely unclear. This paper also provides insight into how oligodendrocyte heterogeneity may contribute to the fine-tuning of neural circuit function, adding further value to our findings. Importantly, our study covers the wide range of development in the sound localization circuit, from the pre-myelination (E9) to the postfunctional maturation (P9), revealing how the nodal spacing pattern along the axon in this circuit emerges and matures.

      Reviewer #2:

      Evidence, reproducibility and clarity

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major points, detailed below, need to be addressed to overcome some limitations of the study.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      eTeNT is a widely used genetically encoded silencing tool and constructs similar to the one used in this study have been successfully applied in primates and rodents to suppress target behaviors via genetic dissection of specific pathways (Kinoshita et al., 2012; Sooksawate et al., 2013). However, precisely quantifying the extent of vesicular release inhibition from NM axons in the brainstem auditory circuit is technically problematic.

      One major limitation is that while A3V efficiently infects NM neurons, its transduction efficiency does not reach 100%. In electrophysiological evaluations, NL neurons receive inputs from multiple NM axons, meaning that responses may still include input from uninfected axons. Additionally, failure to evoke synaptic responses could either indicate successful silencing or failure to stimulate NM axons, making a clear distinction difficult. Furthermore, unlike in motor circuits, we cannot assess the effect of silencing by observing behavioral outputs.

      Thus, we instead opted to quantify the precise expression efficiency of GFP-tagged eTeNT in the cell bodies of NM neurons. The proportion of NM neurons expressing GFP-tagged eTeNT was 89.7 ± 1.6% (N = 6 chicks), which is consistent with previous reports evaluating A3V transduction efficiency in the brainstem auditory circuit (Matsui et al., 2012). These results strongly suggest that synaptic transmission from NM axons was globally silenced by eTeNT at the NL region. We described these explicitly in text (page 8, line 2).

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      Figure 6D depicts a representative axon selected from a dense population of GFP-positive axons in a 200-μm-thick slice after A3V-eTeNT infection to bilateral NM. As shown in Supplementary Video 1 and 2, densely labeled GFP-positive axons can be traced along the main trunk. To prevent any misinterpretation, we have revised the description of Figure 6 in the main text and Figure legend (page 31, line 9), and stated the A3V-eTeNT infection efficiency was 89.7 ± 1.6% in NM neurons, as mentioned above. Based on this efficiency, we interpreted that the global occlusion of vesicular release from most of the NM axons altered the pericellular microenvironment of the NL region, which led to the regional effect on the oligodendrocyte density.

      On the other hand, your question regarding whether sparse expression of eTeNT still has an effect is highly relevant. As we also discussed in our reply to comment 4 by Reviewer #1, the relationship between neuronal activity and oligodendrocytes is highly diverse. In some types of axons, vesicular release is essential for normal myelination, and this process was disrupted by TeNT (Koudelka et al., 2016), suggesting that direct interaction with oligodendrocytes via vesicle release may actively promote myelination in these types of axons.

      To clarify whether the phenotype observed in Figure 6 arises from changes in the pericellular microenvironment at the NL region or from the direct suppression of axon-oligodendrocyte interactions, we included a new Supplementary Figure (Figure 6—figure supplement 1). In this figure, we evaluated the node formation on the axon sparsely expressing eTeNT by electroporation into the unilateral NM. The results showed that sparse eTeNT expression did not increase the percentages of heminodes or unmyelinated segments. This finding supports our conclusion that the increased unmyelinated segments by A3V-eTeNT resulted from impaired synaptic transmission at NM terminals and subsequent alterations of  pericellular microenvironment at the NL region.

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:

      Thank you for your valuable suggestions to improve the rigor of our statistical analyses. We have reanalyzed all statistical tests using R software. In the revised Methods section and Figure Legends, we have clarified the rationale for selecting each statistical test, specified which test was used for each figure, and explicitly defined both n and N. After reevaluation with the Shapiro-Wilk test, we adjusted some analyses to non-parametric tests where appropriate. However, these adjustments did not alter the statistical significance of our results compared to the original analyses.

      (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or MannWhitney U test, which are much more common in the field. What is the rationale for the test choice?

      We have revised the explanation of our statistical test choices to provide greater clarity and precision. For example, in Figure 2G, we first assessed the normality of the data in each of the four groups using the Shapiro-Wilk test, which revealed that some datasets did not follow a normal distribution. Given this, we selected the Kruskal-Wallis test, a commonly used non-parametric test for comparisons across three or more groups. Since the Kruskal-Wallis test indicated a significant difference, we conducted a post hoc Steel-Dwass test to determine which specific group comparisons were statistically significant.

      (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a ttest used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.

      In the case of Figures 3H-K, we compared oligodendrocyte morphology between regions. However, since the number of sparsely labeled oligodendrocytes differs both between regions and across individual samples, there is no strict correspondence between paired measurements. On the other hand, in Figures 5B, C, and E, we compared the density of labeled cells between regions within the same slice, establishing a direct correspondence between paired data points. For these comparisons, we appropriately used a paired t-test.

      (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)

      We have specified the statistical tests used for each figure in the Methods section and Figure Legends for better clarity. Additionally, we have revised the descriptions for Figure 4G, L, and M and their corresponding Figure Legends to explicitly indicate that Spearman’s rank correlation coefficient (rₛ) was used for evaluation.

      (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).

      We have now clarified what “n” represents in each figure, as well as the number of animals (N) used in each experiment, in the Figure Legends.

      In this study, developmental stages of chick embryos were defined by HH stage (Hamburger and Hamilton, 1951), minimizing individual variability. Additionally, since our study focuses on the distribution of morphological characteristics of individual cells, averaging measurements per animal would obscure important cellular-level variability and potentially mislead interpretation of data. Furthermore, we employed a strategy of sparse genetic labeling in many experiments, which naturally results in variability in the number of measurable cells per animal. Given the clear distinctions in our data distributions, we believe that averaging per biological replicate is not essential in this case.

      To further ensure the robustness of our statistical analysis, data presented as boxplots were preliminarily assessed using PlotsOfDifferences, a web-based application that calculates and visualizes effect sizes and 95% confidence intervals based on bootstrapping (https://huygens.science.uva.nl/PlotsOfDifferences/; https://doi.org/10.1101/578575). Effect sizes can serve as a valuable alternative to p-values (Ho, 2018; https://www.nature.com/articles/s41592019-0470-3). The significant differences reported in our study are also supported by clear differences in effect sizes, ensuring that our conclusions remain robust regardless of the statistical approach used.

      If requested, we would be happy to provide PlotsOfDifferences outputs as supplementary source data files, similar to those used in eLife publications, for each figure.

      (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      We have now incorporated individual data points into the bar graphs in Figures 5 and 6.

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      The morphological differences of oligodendrocytes between white and gray matter are well established (i.e. shorter myelin at gray matter), but their correspondence with the nodal spacing pattern along the long axonal projections of cortical neurons is not well understood. Future research may find similarities with our findings. Additionally, as mentioned in the final section of the Discussion, the mammalian brainstem auditory circuit is functionally analogous to the avian ITD circuit. Regional differences in nodal spacing along axons have also been observed in the mammalian system, raising the important question of whether these differences are supported by regional heterogeneity in oligodendrocytes. Investigating this possibility will facilitate our understanding of the underlying logic and mechanisms for determining node spacing patterns along axons, as well as provide valuable insights into evolutionary convergence in auditory processing mechanisms. We described these explicitly in text (page 11, line 34).

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      As you summarized in your comment, our results demonstrated that A3V-eTeNT suppressed oligodendrogenesis in the NL region, leading to a reduction in oligodendrocyte density (Figures 6L, M), which caused the emergence of unmyelinated segments. While this is an indirect manipulation of oligodendrocyte density, it nonetheless provides evidence supporting a causal relationship between oligodendrocyte density and nodal spacing.

      The emergence of unmyelinated segments at the NL region further suggests that the myelin extension capacity of oligodendrocytes differs between regions, highlighting regional differences in intrinsic properties of oligodendrocyte as the most prominent determinant of nodal spacing variation. However, as you correctly pointed out, our findings do not establish direct causation.

      In the future, developing methods to artificially manipulate myelin length could provide a more definitive demonstration of causality. Given these considerations, we have modified the title to replace "determine" with "underlie", ensuring that our conclusions are presented with appropriate nuance.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:

      (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)

      (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)

      (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).

      (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Thank you for your insightful suggestions. We have incorporated the relevant references you provided and revised the manuscript accordingly to contextualize our findings within the existing literature.

      Minor comments:

      (7) Can the authors amend Fig. 1G with the correct units of measurement, not millimetres.

      Response: 

      Thank you for your suggestion. We have corrected the units in Figure 1G to µm

      (8) The Olig2 staining in Fig 2C does not appear to be nuclear, as would be expected of a transcription factor and as is well established for Olig2, but rather appears to be excluded from the nucleus, as it is in a ring or donut shape. Can the authors comment on this?

      Oligodendrocytes and OPCs have small cell bodies, often comparable in size to their nuclei. The central void in the ring-like Olig2 staining pattern appears too small to represent the nucleus. Additionally, a similar ring-like appearance is observed in BrdU labeling (Figure 5G), suggesting that this staining pattern may reflect nuclear morphology or other structural features.

      Significance

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

      The main finding of our study is that the primary determinant of the biased nodal spacing pattern in the sound localization circuit is the regional heterogeneity in the morphology of oligodendrocytes due to their intrinsic properties (e.g., their ability to produce and extend myelin sheaths) rather than the density of the cells. This was based on our observations that a reduction of oligodendrocyte density by A3V-eTeNT expression caused unmyelinated segments but did not increase internodal length (Figure 6), further revealing the importance of oligodendrocyte density in ensuring full myelination for the axons with short internodes. Thus, we think that our study could propose the significance of oligodendrocyte heterogeneity in the circuit function as well as in the nodal spacing using experimental manipulation of oligodendrocyte density. 

      Reviewer #3:

      Evidence, reproducibility and clarity

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing. I have some major concerns:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      Thank you for your insightful comment regarding the potential role of pre-nodal clusters in determining internodal length. Indeed, studies in zebrafish have suggested that pre-nodal clustering of node components prior to myelination may prefigure internodal length (Vagionitis et al., 2022). We have incorporated a discussion on whether such pre-nodal clusters could contribute to regional differences in nodal spacing in our manuscript (page 9, line 35).

      Whether pre-nodal clusters are detectable before myelination appears to depend on neuronal subpopulation (Freeman et al., 2015). To investigate the presence of pre-nodal clusters along NM axons in the brainstem auditory circuit, we previously attempted to visualize AnkG signals at E13 and E14. However, we did not observe clear structures indicative of pre-nodal clusters; instead, we only detected sparse fibrous AnkG signals with weak Nav clustering at their ends, consistent with hemi-node features. This result does not exclude the possibility of pre-nodal clusters on NM axons, as the detection limit of immunostaining cannot be ruled out. In brainstem slices, where axons are densely packed, nodal molecules are expressed at low levels across a wide area, leading to a high background signal in immunostaining, which may mask weak pre-nodal cluster signals prior to myelination. Regarding the comment on Figure 1D, we assume you are referring to Figure 2D based on the context. The lack of clarity in the high-magnification images in Figure 2D results from both the high background signal and the limited penetration of the MAG antibody. Furthermore, we are unable to verify Neurofascin accumulation at pre-nodal clusters, as there is currently no commercially available antibody suitable for use in chickens, despite our over 20 years of efforts to identify one for AIS research. Therefore, current methodologies pose significant challenges in visualizing pre-nodal clusters in our model. Future advancements, such as exogenous expression of fluorescently tagged Neurofascin at appropriate densities or knock-in tagging of endogenous molecules, may help overcome these limitations.

      However, a key issue to be discussed in this study is not merely the presence or absence of prenodal clusters, but rather whether pre-nodal clusters—if present—would determine regional differences in internodal length. To address this possibility, we have added new data in Figure 6I, measuring the length of unmyelinated segments that emerged following A3V-eTeNT expression.

      If pre-nodal clusters were fixed before myelination and predetermined internodal length, then the length of unmyelinated segments should be equal to or a multiple of the typical internodal length. However, our data showed that unmyelinated segments in the NL region were less than half the length of the typical NL internodal length, contradicting the hypothesis that fixed pre-nodal clusters determine internodal length along NM axons in this region.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      As mentioned in our reply to comment 2 by Reviewer #1, the diameter of NM axons was already evaluated using electron microscopy (EM) in the pioneering study by Seidl et al., (2010). Additionally, EM-based analysis makes it difficult to clearly distinguish between the main trunk of NM axons and thin collateral branches at the NL region. Accordingly, we did not do the EM analysis in this revision. 

      In Figure 4, we used palGFP, which is targeted to the cell membrane, allowing us to measure axon diameter by evaluating the distance between two membrane signal peaks. This approach minimizes the influence of the blurring of fluorescence signals on diameter measurements. Thus, we believe that our method is sufficient to evaluate the relative difference in axon diameters between regions and hence to show that axon diameter is not the primary determinant of the 3-fold difference in internodal length between regions. 

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      The heterogeneity in oligodendrocyte morphology would reflect differences in gene profiles, which, in turn, may arise from differences in their developmental origin and/or pericellular microenvironment of OPCs. We made this point as clear as possible in Discussion (page 9, line 21).

      Significance

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    1. eLife Assessment

      This important study combines electrocardiographic (ECG) and heart/torso anatomy data from subjects included in the UK Biobank to analyze sex-specific differences in relationships between those two characteristics. The study has several compelling strengths, including the development of an open-source pipeline for reconstruction and analysis of heart/torso geometry from a large cohort. Nevertheless, technical analysis of the data as presented is incomplete, specifically as it pertains to assessment of co-linearity between regressed parameters, interpretation of regression coefficients for sex and/or presence of myocardial infarction, and discussion of potential roles played by underlying electrophysiological derangements. With improvements to these aspects of the analysis, the paper would be of interest to the cardiovascular research community, especially those studying highly relevant health and treatment disparities arising from sex differences.

    2. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

    3. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

    1. eLife Assessment

      This valuable study investigates the self-assembly activity of death-fold domains. The data collected using advanced microscopy and distributed amphifluoric FRET-based flow cytometry methods provide solid evidence for the conclusions, although the interpretations based on these conclusions remain speculative in some cases. This paper is broad interest to those studying a variety of biological pathways involved in inflammatory responses and various forms of cell death.

    2. Reviewer #1 (Public review):

      Summary:

      This is a high-quality and extensive study that reveals differences in the self-assembly properties of the full set of 109 human death fold domains (DFDs). Distributed amphifluoric FRET (DAmFRET) is a powerful tool that reveals the self-assembly behaviour of the DFDs, in non-seeded and seeded contexts, and allows comparison of the nature and extent of self-assembly. The nature of the barriers to nucleation is revealed in the transition from low to high AmFRET. Alongside analysis of the saturation concentration and protein concentration in the absence of seed, the subset of proteins that exhibited discontinuous transitions to higher-order assemblies was observed to have higher concentrations than DFDs that exhibited continuous transitions. The experiments probing the ~20% of DFDs that exhibit discontinuous transition to polymeric form suggest that they populate a metastable, supersaturated form in the absence of cognate signal. This is suggestive of a high intrinsic barrier to nucleation.

      Strengths:

      The differences in self-assembly behaviour are significant and likely identify mechanistic differences across this large family of signalling adapter domains. The work is of high quality, and the evidence for a range of behaviours is strong. This is an important and useful starting point since the different assembly mechanisms point towards specific cellular roles. However, understanding the molecular basis for these differences will require further analysis.

      An impressive optogenetic approach was engineered and applied to initiate self-assembly of CASP1 and CASP9 DFDs, as a model for apoptosome initiation in these two DFDs with differing continuous or discontinuous assembly properties. This comparison revealed clear differences in the stability and reversibility of the assemblies, supporting the hypothesis that supersaturation-mediated DFD assembly underlies signal amplification in at least some of the DFDs.

      The study reveals interesting correlations between supersaturation of DFD adapters in short- and long-lived cells, suggestive of a relationship between the mechanism of assembly and cellular context. Additionally, the comprehensive nature of the study provides strong evidence that the interactions are almost all homomeric or limited to members of the same DFD subfamily or interaction network. Similar approaches with bacterial proteins from innate immunity operons suggest that their polymerisation may be driven by similar mechanisms.

      Weaknesses:

      Only a limited investigation of assembly morphology was conducted by microscopy. There was a tendency for discontinuous structures to form fibrillar structures and continuous to populate diffuse or punctate structures, but there was overlap across all categories, which is not fully explored. The methodology used to probe oligomeric assembly and stability (SDD-AGE) does not justify the conclusions drawn regarding stability and native structure within the assemblies.

      The work identifies important differences between DFDs and clearly different patterns of association. However, most of the detailed analysis is of the DFDs that exhibit a discontinuous transition, and important questions remain about the majority of other DFDs and why some assemblies should be reversible and others not, and about the nature of signalling arising from a continuous transition to polymeric form.

      Some key examples of well-studied DFDs, such as MyD88 and RIPK,1 deserve more discussion, since they display somewhat surprising results. More detailed exploration of these candidates, where much is known about their structures and the nature of the assemblies from other work, could substantiate the conclusions here and transform some of the conclusions from speculative to convincing.

      The study concludes with general statements about the relationship between stochastic nucleation and mortality, which provide food for thought and discussion but which, as they concede, are highly speculative. The analogies that are drawn with batteries and privatisation will likely not be clearly understood by all readers. The authors do not discuss limitations of the study or elaborate on further experiments that could interrogate the model.