10,000 Matching Annotations
  1. Mar 2026
    1. Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found clear population structure that did not show a geographical pattern, and flower color was rather associated with population structure. Excess of homozygosity indicate high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. Authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strength of the present study. Coalescent-based analysis nicely supports the authors' claim that multiple introductions may have contributed the population structure of this species.

      Weaknesses:

      The main findings of the SLiM-based simulation were that inbreeding promotes fixation of alleles and reduction in genetic diversity. These are theoretically well known, and such findings themselves are not novel, although it may have become interesting if these findings are quantitatively integrated with their empirical findings in the studied species.

    2. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have carefully addressed all the concerns raised in the responses below and incorporated the suggested revisions into the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:

      The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. The observed patterns in the inbreeding coefficient and heterozygosity can indeed arise from multiple factors, including past bottlenecks, selection, inbreeding, and selfing. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India. These results are included in the revised manuscript.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #1 (Recommendations for the authors):

      Lantana camara is a globally invasive plant as the authors mention in their manuscript, but this study only focuses on India. This should be reflected in the title.

      The reviewer has suggested that the title should reflect the study area. Since our sampling covers nearly all regions in India, we believe the patterns observed here are likely representative of those in other parts of the invaded range. For this reason, we would prefer to retain the current heading.

      It would be helpful if the pictures of the flowers in Figure 3 were larger to more clearly see the different colors.

      As per the reviewers suggestion we have increased the size of the images to improve clarity.

      Figure 4 could probably be moved to supplemental material, it does not add much to the results.

      We feel it is important to reiterate that the patterns we observe in Lantana are consistent with what one would expect in any predominantly self-fertilizing species. It act as an additional proof and therefore, we believe it is important to retain this figure, as it effectively conveys this link.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location (similar to what we observed in Lantana camara) can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation. Therefore, we conducted these simulations ourselves for invasive plants to test whether the patterns we observed are consistent with expectations for a predominantly self-fertilising species.

      Additionally, as suggested by the reviewer, we have performed demographic history simulations using fastsimcoal2 to investigate the divergence among different flower colour morphs. The results have been incorporated into the revised manuscript.

      First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Applying a HWE filter is a common practice in genomic data analysis because it helps remove potential sequencing or genotyping artefacts, which can otherwise bias downstream analyses. However, we understand that HWE filtering can also remove biologically informative loci and potentially bias the analysis, especially when a stringent cutoff is used. A strict filter might retain only loci that perfectly fit Hardy–Weinberg expectations and exclude sites influenced by real evolutionary processes like selection and/or inbreeding.

      To balance this, we used a mild HWE filter, aiming to remove clear artefacts while retaining loci that may reflect genuine biological signals. Another reason for applying it is that many downstream tools, for example, admixture, assume the markers are neutral and not strongly deviating from HWE (although this assumption may not always hold). This helps in avoiding the complexity of the model.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate.

      We have cited the references for these values in the manuscript. However, for Lantana, many such baseline data are not available, so we used general values reported for plants, which is an accepted approach when working with understudied species. Moreover, the aim of these simulations was to develop a general understanding of how mating systems influence genetic diversity in invasive plants, rather than to parameterize the simulations specifically for Lantana.

      While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilisation alone.

      Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      In genetic simulations, it is often best to begin with simpler scenarios involving fewer parameters, and we followed this approach. As the reviewer rightly pointed out, selfing can influence multiple factors such as mutation and recombination rates. However, to first understand the broad effects, we chose to work with simpler scenarios where both mutation and recombination rates were kept constant.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We thank the reviewer for this valuable suggestion. We have performed a MANOVA to test the association between flower colour and genetic structure. These results are incorporated in the revised manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively?

      We carefully considered this and defined our criteria based on flower colour. Specifically, we named morphs according to the colour of both young and old flowers. If both stages shared the same colour, we used that colour as the name. As shown in Figure 1b, it is possible to reliably distinguish between the different flower colour morphs. While one could also measure flower colour using a photometer, we believe both approaches yield similar results.

      I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The flower colour changes within an inflorescence, with young flowers shifting colour after pollination. However, this trend is consistent within a plant; for example, the yellow–pink morph always changes from yellow to pink. Based on this consistency, we incorporated a naming system that considers both the colour of younger and older flowers.

      Reviewer #2 (Recommendations for the authors):

      Figure 4: Figures a and b are not the "signatures of high inbreeding", because such patterns could also simply happen due to geographical isolation. The title of the figure could be changed. Figure 4c should be presented as a histogram.

      We have incorporated this suggestion into the manuscript and revised the figure title accordingly. However, we believe that presenting Figure 4c in its current form is more informative.

      L459 "in the introduced range, Lantana is self-compatible": is it self-incompatible in the native range? If it is known, it could be mentioned in the manuscript.

      A previous study from India demonstrated that self-fertilisation is possible in Lantana, providing an additional line of evidence for our findings. However, Lantana remains poorly studied in its native range, and to the best of our knowledge, only a single study has examined its pollination biology there, which we have cited in this paper.

    1. eLife Assessment

      Winter months with short days are commonly associated with seasonal depression and hypersomnolence; the mechanisms behind this hypersomnolence however, remain unclear. Chen and colleagues identify a genetic basis for this phenomenon in the fly Drosophila - mutations in the circadian photoreceptor cryptochrome resulted in increased sleep under short photoperiods. These findings are valuable insights into the genetic mechanisms regulating sleep under short days. The data supporting the precise neurobiological basis of these effects however, remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (CRY) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of CRY action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that CRY acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this CRY action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      While the authors have made improvements in this resubmitted manuscript, there are still multiple concerns about the paper. I think the authors provide enough evidence suggesting that CRY plays a role in sleep under short photoperiod. The data also supports that CRY acts in GABAergic neurons. However, there are still major issues with the quality of the confocal images presented throughout the paper. In many cases it appears that the images are oversaturated with poor resolution, making it hard to understand what is going on. In addition, none of the drivers used in this study are specific to the neurons the authors aim to manipulate. Therefore, the identity of the GABAergic neurons involved in this CRY dependent sleep mechanism remains unclear. Similarly, whether l-LNvs are the target of this GABA mediated sleep regulation under short photoperiod is not fully demonstrated. The data presented suggests that but does not prove it.

      Major concerns:

      (1) While the authors provided sleep parameters like consolidation or waking activity for some experiments. These measurements are still not shown for several experiments (for example Figures 2E, 3, 4, 5, and 6). These data are essential, these metrics must be reported for all sleep experiments.

      (2) Line 144 "We fed flies with agonists of GABA-A (THIP) and GABA-B receptor (SKF-97541) (Ki and Lim, 2019; Matsuda et al., 1996; Mezler et al., 2001). Both drugs enhance sleep in WT," The proper citation is needed here, Dissel et al., 2015 PMID:25913403. Both THIP and SKF-97541 were used in that paper.

      (3) Figure 2C and 2F: it appears that the control data is the same in both panels. That is not acceptable.

      (4) Figure 4A: With the quality of the images, it is impossible to assess whether GABA levels are increased at the l-LNvs soma.

      (5) Fig 4 S1A shows colabeling of l-LNvs and Gad1-Gal4 expressing neurons. They are almost 100% overlapping signals. This would indicate that the l-LNvs are GABAergic themselves, or that there is a problem with this experiment.

      (6) Fig 4 S1B: Again, I can see colabelling of the GFP and PDF staining, suggesting that Gad1-Gal4 expresses in l-LNvs.

      (7) Line 184: "Consistently, knocking down Rdl in the l-LNvs rescues the long sleep phenotype of cry mutants (Figure 4-figure supplement 1D)." This statement is incorrect as the driver used for this experiment, 78G01-GAL4 is not specific to the l-LNvs, so it is possible that the phenotypes observed are not coming from these neurons.

      (8) Figure 4G-K: None of these manipulations are specific to the l-LNvs. The authors describe 10H10-GAL4 and 78G01-GAL4 as l-LNvs specific tools, but this is not the case. Why not use the SS00681 Split-GAL4 line described in Liang et al., 2017 PMID: 28552314? It is possible that some of the effects reported in this manuscript are not caused by manipulating the l-LNvs.

      (9) Similarly for the manipulation of s-LNvs, the authors cannot rule out effect that are coming from other cells as R6-GAL4 is not specific to s-LNvs.

      (10) The staining presented in Fig 5 S1 is not very convincing. Difficult to see whether Gad1-GAL4 only expresses in the s-LNvs.

    3. Reviewer #3 (Public review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects is however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light : 20 h dark). They go on to show that cry is required in GABAergic neurons and that the effects of the cry mutation on sleep are mediated by alterations in GABA signalling. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal promoting large ventral neurons via GABA signalling

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' have substantially revised their manuscript, and the manuscript is better for the revisions. However, the conclusion that the sLNvs are GABAergic is unfortunately still not well supported by the data. A key sticking point remains the anti GABA immunostaining, and specific driver lines for sLNvs and lLNvs.

      The authors should tone down their conclusions to reflect the fact that their data, as presented, does not support the model that cry acts in sLNvs to modulate GABA signalling onto lLNvs and thus modulate sleep.

    4. Reviewer #4 (Public review):

      Summary:

      Short photoperiod is an important experimental manipulation in neurobiology, endocrinology, and metabolism studies. However, the molecular mechanisms by which short photoperiod gives rise to behavioral phenotypes that are seen in seasonal affective disorders remain unknown. Using the classic circadian model organism Drosophila, this study examines short photoperiod-induced hypersomnolence and identifies the circadian photoreceptor cryptochrome as a regulator of GABAergic tone within the clock neural circuit to promote wakefulness under short photoperiod conditions. The discovery has broad implications for understanding how short photoperiod modulates neural inhibition in circadian circuits in regulating sleep.

      Strengths:

      The Drosophila model provided a powerful platform to dissect the molecular mechanisms underlying short photoperiod-induced hypersomnolence. A battery of behavioral, imaging, circuit-manipulation approaches was employed to test the novel hypothesis that the circadian photoreceptor cryptochrome modulates GABAergic tone within the clock neural circuit to promote wakefulness under short photoperiod conditions.

      Weaknesses:

      The current model proposed by the authors suggests that the small ventral lateral neurons of the Drosophila clock circuit are GABAergic; however, this remains unclear. At present, the field lacks sufficient data and validated reagents to definitively establish the GABAergic identity of these neuropeptidergic neurons.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (cry) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of cry action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that cry acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this cry action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal-promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      Major concerns:

      (1) Figure 2 A-B: The authors show that knocking down cry expression in GABAergic neurons mimics the sleep increase seen in cryb mutants under short photoperiod. However, they do not provide any other sleep parameters such as sleep bout numbers, sleep bout duration, and more importantly waking activity measurements. This is an essential parameter that is needed to rule out paralysis and/or motor defects as the cause of increased "sleep". Any experiments looking at sleep need to include these parameters.

      Thank you for bringing up these points. We have now included these sleep parameters in Figure 2—figure supplement 3.

      (2) For all Figures displaying immunostaining and imaging data the resolution of the images is quite poor. This makes it difficult to assess whether the authors' conclusions are supported by the data or not.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue.

      (3) In Figure 4-S1A it appears that the syt-GFP signal driven by Gad1-GAL4 is colabeling the l-LNvs. This would imply that the l-LNvs are GABAergic. The authors suggest that this experiment suggests that l-LNvs receive input from GABAergic neurons. I am not sure the data presented support this.

      We agree that this piece of data alone is not sufficient to demonstrate that the l-LNvs receive GABAergic inputs rather than the l-LNvs are GABAergic. However, when nlsGFP signal is driven by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), we do not observe any prominent signal in the l-LNvs (Figure 5A and B; Figure 5-figure supplement 1A). We have also co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. This further supports the idea that the l-LNvs are not GABAergic, and that the syt-GFP signal likely arises from GABAergic neurons projecting to the l-LNvs.

      (4) In Figure 4-S1B. The GRASP experiment is not very convincing. The resolution of the image is quite poor. In addition, the authors used Pdf-LexA to express the post t-GRASP construct in l-LNvs, but Pdf-LexA also labels the s-LNvs, so it is possible that the GRASP signal the authors observe is coming from the s-LNvs and not the l-LNvs. The authors could use a l-LNvs specific tool to do this experiment and remove any doubts. Altogether this reviewer is not convinced that the data presented supports the conclusion "All in all, these results demonstrate that GABAergic neurons project to the l-LNvs and form synaptic connections." (Line 176). In addition, the authors could have downregulated the expression of Rdl specifically in l-LNvs to support their conclusions. The data they are providing supports a role for RDL but does not prove that RDL is involved in l-LNvs.

      Thank you for these wonderful suggestions. Again we apologize for the poor resolution and hopefully by uploading the images separately we can resolve this issue. We agree that the GRASP signal could be coming from the s-LNvs and not the l-LNvs but unfortunately we are not able to find a LexA that is specifically expressed in the l-LNvs. We believe the trans-Tango data further support the idea that GABAergic neurons project to and form synaptic connections with the l-LNvs. Nonetheless, we have changed our conclusion to “All in all, these results strongly suggest that GABAergic neurons project to the l-LNvs and form synaptic connections” to be more rigorous. In addition, we have obtained R78G01GAL4 which is specifically expressed in the l-LNvs, and using this GAL4 to knock down Rdl rescues the long-sleep phenotype of cry mutants (Figure 4—figure supplement 1D).

      (5) In Figures 4 A and C: it appears that GABA is expressed in the l-LNvs. Is this correct? Can the authors clarify this? Maybe the authors could do an experiment where they co-label using Gad1-GAL4 and Pdf-LexA to clearly demonstrate that l-LNvs are not GABAergic. Also, the choice of colors could be better. It is very difficult to see what GABA is and what is PDF.

      Thank you for this wonderful suggestion. We have now co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. We suspect the GABA signal at the l-LNvs may arise from the GABAergic projections received by these cells. We have now changed the color of the GABA/PDF signals in these images and have reduced the intensity of the PDF signal. Hopefully, it would be easier to visualize in this revised version.

      (6) Figure 4G: Pdf-GAL4 expresses in both s-LNvs and l-LNvs. So, in this experiment, the authors are silencing both groups, not only the l-LNvs. Why not use a l-LNvs specific tool?

      Thank you for bring up this important point. We have previously used c929GAL4 to express Kir2.1 and this led to lethality. We have now used two l-LNv-specific GAL4 drivers (R78G01GAL4 and R10H10GAL4) that we newly obtained to express Kir2.1 but did not observe significant effect on sleep. Please see Author response image 1 for the results.

      Author response image 1.

      Daily sleep duration of male flies expressing Kir2.1 in l-LNvs using R78G01GAL4 (A)(n = 40, 41, 30 flies) and R10H10GAL4 (B) (n = 40, 41, 32 flies) and controls, monitored under 4L20D. One-way ANOVA with Bonferroni multiple comparison test was used to calculate the difference between experimental group and control group.

      (7) Figure 4H-I: The C929-GAL4 driver expresses in many peptidergic neurons. This makes the interpretation of these data difficult. The effects could be due to peptidergic cells being different than the l-LNvs. Why not use a more specific l-LNvs specific tool? I am also confused as to why some experiments used Pdf-GAL4 and some others used C929-GAL4 in a view to specifically manipulate l-LNvs? This is confusing since both drivers are not specific to the l-LNvs.

      Thank you for bring up these important points. We have now used the l-LNv-specific R10H10GAL4 and the results are more or less comparable with that of c929GAL4 (Figure 4I and K), i.e. activating the l-LNvs blocks the long-sleep phenotype of cry mutants. The reason PdfGAL4 is used in 4G is because c929GAL4 leads to lethality while the l-LNv-specific GAL4 lines do not alter sleep.

      (8) Figure 5-S1B: Why does the pdf-GAL80 construct not block the sleep increase seen when reducing expression of cry in Gad1-GAL4 neurons? This suggests that there are GABAergic neurons that are not PDF expressing involved in the cry-mediated effect on sleep under short photoperiods.

      Yes, this is indeed the conclusion we draw from this result, and we commented on this in the Discussion: “Moreover, inhibiting cry RNAi expression in PDF neurons does not eliminate the long-sleep phenotype of Gad1GAL4/UAScryRNAi flies. Therefore, we suspect that cry deficiency in other GABAergic neurons is also required for the long-sleep phenotype. Given that the s-LNvs are known to express CRY and appear to be GABAergic based on our findings here, we believe that CRY acts at least in part in the s-LNvs to promote wakefulness under short photoperiod.”

      In conclusion, it is not clear that the authors demonstrated that they are looking at a cry-mediated effect on GABA in s-LNvs resulting in a modulation of the activity of the l-LNvs. Better images and more-suited genetic experiments could be used to address this.

      Thank you very much for all the comments. They are indeed quite helpful for improving our manuscript. Hopefully, with images of higher quality and the additional experiments described above, we have now provided more evidence supporting our major conclusion.

      Reviewer #2 (Public Review):

      Summary:

      The sleep patterns of animals are adaptable, with shorter sleep durations in the winter and longer sleep durations in the summer. Chen and colleagues conducted a study using Drosophila (fruit flies) and discovered that a circadian photoreceptor called cryptochrome (cry) plays a role in reducing sleep duration during day/night cycles resembling winter conditions. They also found that cry functions in specific GABAergic circadian pacemaker cells known as s-LNvs inhibit these neurons, thereby promoting wakefulness in the animals in the winter. They also identified l-LNvs, known as arousal-promoting cells, as the downstream neurons.

      Strengths:

      Detailed mapping of the neural circuits cry acts to mediate the shortened sleep in winter-like day/night cycles.

      Weaknesses:

      The supporting evidence for s-LNvs being GABAergic neurons is not particularly strong. Additionally, there is a lack of direct evidence regarding changes in neural activity for s-LNvs and l-LNvs under varying day/night cycles, as well as in cry mutant flies.

      Thank you very much for all the comments. We have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      We have now examined GCaMP signals in the l- and s-LNvs of WT and cry mutants under 4L20D/12L12D. Please see Author response image 2 for the results. As can be seen, both WT and cry mutants show photoperiod-dependent changes. Interestingly, cry mutants show more prominent reduction of GCaMP signal in the l-LNvs compared to WT under 12L12D vs. 4L20D, but the sleep duration phenotype is observed only under 4L20D. Moreover, GCaMP signal is elevated in the s-LNvs of cry mutants relative to WT under 4L20D but decreased under 12L12D. These results indicate that there are distinct mechanisms regulating sleep under short vs. normal photoperiod (with CRY being dispensable under 12L12D), and the role of CRY in modulating the activity of these neurons are also photoperiod-dependent. Further in-depth characterizations are need to delineate these complex issues.

      Author response image 2.<br /> Quantification of GCaMP6m signal intensity normalized to that of tdTomato under 12L12D and 4L20D (n = 25-45 cells). Student’s t-test: compared to WT, #P < 0.05, ##P < 0.01; 12L12D vs. 4L20D, *P < 0.05, ***P < 0.001.

      Reviewer #3 (Public Review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects are, however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light: 20 h dark). They go on to show that cry is required in GABAergic neurons. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal-promoting large ventral neurons via GABA signalling.

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' conclusion that the sLNvs are GABAergic is not well supported by the data. Better immunostaining experiments and perhaps more specific genetic driver lines would help with this point (details below).

      (1) The sLNvs are well known as a key component of the circadian network. The finding that they are GABAergic would if true, be of great interest to the community. However, the data presented in support of this conclusion are not convincing. Much of the confocal images are of insufficient resolution to evaluate the paper's claims. The Anti-GABA immunostaining in Fig 4 and 5 seem to have a high background, and the GRASP experiments in Fig 4 supplement 1 low signal.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue. Unfortunately, the GABA immunostaining does not work very well in our hands and thus the background is high. We have now adjusted the images by changing the minimum lookup table (LUT) value in the green channel to 213, which removes all pixels below 213. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the contrast. Furthermore, we have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      Transcriptomic datasets are available for the components of the circadian network (e.g. PMID 33438579, and PMID 19966839). It would be of interest to determine if transcripts for GAD or other GABA synthesis/transport components were detected in sLNvs. Further, there are also more specific driver lines for GAD, and the lLNvs, sLNVs that could be used.

      Thank you for these wonderful suggestions. Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” PMID 33438579 does not report expression of these genes in either s-LNvs or l-LNvs, likely due to insufficient sequencing depth. Furthermore, we have now used two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4) to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and K).

      (2) The authors' model posits that in short photoperiods, cry functions to suppress GABA secretion from sLNvs thereby disinhibiting the lNVs. In Fig 4I they find that activating the lLNvs (and other peptidergic cells) by c929>NaChBac in a cryb background reduces sleep compared to activating lLNVs in a wild-type background. It's not clear how this follows from the model. A similar trend is observable in Fig 4H with TRP-mediated activation of lNVs, although it is not clear from the figure if the difference b/w cryb vs wild-type background is significant.

      Thank you for bring up this important point. This does appear to be counterintuitive. We suspect that in cry mutants, there is more inhibition occurring at the l-LNvs and thus the system may be particularly sensitive to their activation. Therefore, activating these neurons on the mutant background can result in a more prominent wake-promoting effect compared to that of WT.

      Recommendations for the authors:

      Our major concern centers around the claim that the sLNvs are GABAergic and secrete GABA onto the lLNVs. As it stands, this is not well supported by the data.

      The authors could substantiate these findings by using more specific driver lines for GAD / vGAT (MiMic based lines are available that should better recapitulate endogenous expression). Transcriptomic data for circadian neurons are available, the FlyWire consortium also predicts neurotransmitter identities for specific neural circuits. These datasets could be mined for evidence to support the claim of sLNvs being GABAergic

      Thank you for these wonderful suggestions. We have now used MiMic-based lines for Gad1 (BS52090, Mi{MIC}Gad1MI09277) and VGAT (BS23022, Mi{ET1}VGATMB01219) to knock down cry but unfortunately were not able to observe changes in sleep. Please see Author response image 3 for the results.

      Author response image 3.

      Daily sleep duration of male flies with cry knocked down in GABAergic neurons by Gad1GAL4 (A) (n = 30, 38, 50, 18, 31 flies) or VGATGAL4 (B) (n = 28, 38, 50, 18, 30 flies) monitored under 4L20D.One-way ANOVA with Bonferroni multiple comparison test: compared to UAS control, ###P < 0.001.

      Furthermore, we have now included another Gad1GAL4 line which is generated by knocking GAL4 transgene into the Gad1 locus. We are also able to observe increased sleep when using this GAL4 to knock down cry, and positive signals in the s-LNvs can be observed when using this GAL4 to drive nlsGFP (Figure 2B; Figure 5-figure supplement 1A).

      Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” The FlyWire does not have prediction for this particular circuit that we are interested in.

      Further, many of the immunostaining images have high background / low signal - so better confocal images would help, as would the use of more specific driver lines for the lNVs as it is sometimes hard to distinguish the lLNvs from sLNvs.

      We have now adjusted all images by changing the minimum lookup table (LUT) value in the green channel to 213 and that of the red channel to 279, which removes all pixels below 213 and 279, respectively. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the signal to noise ratio. We were not able to find a LexA line that is specifically expressed in the l-LNvs but we have found two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4). We used these lines to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and 4K).

      Additional specific comments are in the reviews above.

      Minor points:

      (1) Line 55: CRYPTOCHROME is misspelled.

      This has been fixed.

      (2) Line 140: The authors need to provide the appropriate references for the use of THIP and SKF-97541.

      This has been added.

      (3) Line 149: there are multiple GABA-A receptors in flies, the authors should acknowledge that. What about LccH3 or Grd?

      Thank you for bring up this important point. Here we focused only on Rdl because it is the only GABA-A receptor known to be involved in sleep regulation. We have modified our description regarding this issue: “We tested for genetic interaction between cry and Resistant to dieldrin (Rdl), a gene that encodes GABA-A receptor in flies and has previously been shown to be involved in sleep regulation.”

    1. eLife Assessment

      This important study introduces LUNA, a new autofocusing method that achieves nanoscale precision and robustly corrects focus drift during time-lapse microscopy, improving imaging under temperature shifts. The authors exploit this technical advance to investigate the bacterial cold shock response, providing solid evidence that individual cells continue to grow and divide in a highly coordinated process that cannot be observed in population-level measurements. This work offers a technical and conceptual framework for reconciling discrepancies between bulk and single-cell growth measurements, with broad relevance for cell biology and microbiology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

    1. eLife Assessment

      This valuable study introduces MPS, an open-source pipeline that addresses a significant technical bottleneck by making miniscope data analysis more accessible. Characterized by speed and a low barrier to entry, the software's performance is supported by solid evidence. This work will be of interest to miniscope users seeking a streamlined, memory-efficient, end-to-end analysis solution.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript by Peden-Asarch et al. introduces MPS, a new open-source software package for processing miniscope data. The authors aim to provide a fast, end-to-end analysis pipeline tailored to miniscope users with minimal experience in coding or version control. The work addresses an important practical barrier in the field by focusing on usability and accessibility.

      Strengths

      The authors identify a clear and well-motivated need within the miniscope community. Existing pipelines for miniscope data analysis are often complex, difficult to install, and challenging to maintain. In addition, users frequently encounter technical limitations such as out-of-memory errors, reflecting the substantial computational demands of these workflows-resources that are not always available in many laboratories. MPS is presented as an attempt to alleviate these issues by offering a more streamlined, accessible, and robust processing framework.

      Weaknesses

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error:<br /> "rsync(95755): error: ... Permission denied ... unexpected end of file."

      For the purposes of this review, resolving this issue would significantly improve the evaluation of the software and its accessibility to users.

      More broadly, the authors propose self-contained installers as a solution to the "package-management burden" commonly associated with scientific software. While this approach is appealing and potentially useful for novice users, current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript introduces Miniscope Processing Suite (MPS), a novel no-code GUI-based pipeline built to easily process long-duration one-photon calcium imaging data from head-mounted Miniscopes. MPS aims to address two large problems that persist despite the rapid proliferation of Miniscope use across the field. The first issue is concerned with the high technical barrier to using existing pipelines (e.g., CaImAn, MIN1PIPE, Minian, CaliAli) that require users to have coding skills to analyze data. The second problem addressed is the intense memory limitations of these pipelines, which can prevent analysis of long-duration (multi-hour) recordings without state-of-the-art hardware. The MPS toolbox takes inspiration from what existing pipelines do well, innovates new modules like Window Cropping, NNDSVD initialization, Watershed-based segmentation, and improves the user experience to improve access to calcium imaging analysis without the need for new training in new coding languages. In many ways, MPS achieves this aim, and thus will be of interest to a growing, broad audience of new calcium imagers.

      There are, however, some concerns with the current manuscript and pipeline that, if addressed, would greatly improve the impact of this work. Currently, the manuscript provides insufficient evidence that MPS can generate good results efficiently on various data sets, and it is not properly benchmarked against other established packages. Additionally, considering the goal of MPS is to attract novices to attempt Miniscope analysis, better tutorials, documentation, and walkthroughs of expected vs inaccurate results should be provided so that it is clear when the user can trust the output. Otherwise, this simplified approach may end up leading new users to erroneous results.

      Strengths:

      The manuscript itself is well-organized, clear, and easy to follow. MPS is clearly designed to remove the computational barrier for entry for a broad neuroscience community to record and analyze calcium data. The development of several well-detailed algorithmic innovations merits recognition. Firstly, MPS is extremely easy to install, keep updated, and step through. Having each step save every output automatically is a well-thought-out feature that will allow users to enter back into the pipeline at any step and compare results.

      The implementation of an erroneous frame identifier and remover during preprocessing is an important new feature that is typically done offline with custom-built code. Interactive ROI cropping early in the pipeline is an efficient way to lower pixel load, and NNDSVD initialization is a new way to provide nonnegative, biologically interpretable starting spatial and temporal factors for later CNMF iterations. Parallel temporal-first update ordering cuts down dramatically on later computational load. Together, all these features, neatly packaged into a no-code GUI like the Data Explorer for manual curation, are practical additions that will benefit end users.

      Weaknesses:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines. The paper uses their own analysis of processing speeds, component counts, signal-to-noise ratio improvements, and morphological characteristics of detected cells, but it needs to be reworked to include some combination of validation against manually annotated ground truth data sets, simulated data with known cell locations and activity patterns, or cross-validation with established pipelines on identical datasets. Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results that help distinguish it from what is currently already available. For example, line 57 refers to the CaImAn pipeline having near-human efficiency (Figures 3-5 and Tables 1 and 2 of the CaImAn paper), but no specific examples for MPS performance benchmarks are made. Figure 15 of the Minian paper provides other examples of how to show this.

      Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes (FOV, number of frames and/or neurons, sparsity of cells) and differing pipelines. Figure 8 of the CaImAn paper and Figure 18 of the Minian paper show this quite nicely. Table 1 currently references how "traditional approaches" differ methodologically from MPS innovations, but runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster". Additionally, while the paper does mention the type of hardware used by the experimenters, a table with a full breakdown of components may be useful for reproducibility. As well as the minimum requirements for smooth processing.

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators (anything other than GCaMP8f used in the paper), etc. This leads to another limitation of the paper in its current form. While MPS is aimed at eliminating the need to code, users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Perhaps including a tutorial with sample test data for parameter investigation and exploration, like many other existing pipelines do, is warranted. This would also increase the transparency and reproducibility of this work.

      Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization. The main GitHub repository does contain better stepwise explanations, but there needs to be a centralized location for all this information. Additionally, a lack of documentation on the graphs created by each analysis step makes it hard for a true novice to interpret whether their own data is appropriately optimized for the pipeline. Greater detail on this would greatly improve the quality and impact of MPS.

    4. Author response:

      (1) Claim regarding NNDSVD initialization

      Reviewer #1:

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      We agree that our original phrasing was too broad. NNDSVD-family initialization is widely used in NMF implementations (e.g., scikit-learn) and is available within some pipeline components. We revised the abstract and main text to clarify our intended contribution: MPS seeds CNMF directly with NNDSVD-derived nonnegative factors as the primary initialization strategy, rather than relying on heuristic or greedy ROI-based seeding, integrated within a memory-efficient, end-to-end workflow for long-duration miniscope recordings.

      (2) Installation issue on macOS

      Reviewer #1:

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error: "rsync(95755): error: ... Permission denied ...unexpected end of file."

      We thank the reviewer for identifying the broken installer link and the macOS installation error. We fixed the macOS installer link on the documentation website and updated installation instructions to explicitly address common macOS permission-related failures (including rsync "Permission denied" errors that arise when attempting to write into protected directories without appropriate privileges). We re-tested installation on clean macOS systems and confirmed successful installation under the revised instructions.

      (3) Validation, benchmarking, and cross-pipeline comparison

      Reviewer #2:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines... Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results... Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes and differing pipelines... runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster".

      We thank the reviewers for their careful reading and for raising the question of biological validity, which we agree is central to any calcium imaging analysis tool. We would like to clarify, however, that MPS does not introduce a novel source extraction algorithm, and therefore the question of biological validity is not one that MPS alone can answer - nor should it be expected to. MPS is built on CNMF, the same mathematical framework underlying CaImAn and Minian. The contribution of MPS lies in its initialization strategy and parallelization architecture, which allow this proven framework to operate in the multi-hour recording regime.

      To address the reviewers' request for a direct qualitative comparison, we will run MPS, CaImAn, Minian, and MIN1PIPE on a representative 10-minute real recording with clearly visible neurons. The figure will show the spatial components (ROI footprints) and representative temporal traces (ΔF/F) for all four pipelines on identical data. We anticipate that the spatial layouts and temporal dynamics will be highly concordant across pipelines, demonstrating that MPS produces biologically consistent output. We believe this side-by-side comparison will provide a clear demonstration that MPS output is comparable in quality to established tools on tractable recordings.

      Regarding runtime comparison across pipelines, we will provide a table showing approximate processing times at three recording durations (5, 20, and 180 minutes). On short recordings, all pipelines are expected to complete successfully at different rates, whereas on long-duration recordings, this pipeline behavior is expected to diverge. We acknowledge that any single runtime benchmark reflects specific hardware and dataset characteristics and may not generalize to all configurations. We will therefore present these data as illustrative rather than definitive and will direct readers to the MPS documentation for guidance on hardware-specific tuning.

      (4) Dataset description and scope of generalizability

      Reviewer #2:

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators...

      We agree that the dataset description should be centralized and unambiguous. We added a dedicated Methods subsection stating that all results are based on a single, controlled experimental dataset consisting of 28 long-duration miniscope sessions acquired under consistent conditions (same brain region, calcium indicator, optical configuration, and acquisition parameters). This section explicitly specifies the number of animals, brain region, frame rate, field of view, session duration, and total data volume. We also clarified that conclusions are intended to evaluate MPS performance in this controlled long-duration setting rather than to claim universal parameter generalizability across brain regions, indicators, or optical systems.

      (5) Parameter guidance and documentation

      Reviewer #2:

      ...users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization...

      We agree that users should not blindly trust default or suggested parameters. We substantially expanded and centralized documentation by adding a parameter-selection walkthrough that explains what each modifiable parameter does, how it affects intermediate and final outputs, and how diagnostic plots generated at each stage should be interpreted. Rather than prescribing dataset-specific parameter values, we explicitly framed parameter selection as an iterative, hypothesis-driven process informed by experimental factors such as calcium indicator kinetics, lens size and numerical aperture, field of view, recording duration, and expected neuronal density. We consolidated previously dispersed explanations from the GitHub repository into a single documentation site and expanded figure descriptions to guide interpretation by less experienced users. A representative sample dataset and accompanying analysis code were made publicly available at https://github.com/ariasarch/MPS_Sample_Code to support parameter exploration on tractable data.

      (6) Packaging and distribution

      Reviewer #1:

      ...current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

      Regarding distribution more broadly: while our one-click installers are intended to reduce setup burden for non-programmers, we recognize the value of conventional environment-based distribution for longterm sustainability. We are exploring the feasibility of adding a standard PyPI and/or Conda installation pathway alongside the standalone installers. To ensure reproducibility across environments, all package dependencies are now explicitly version-pinned at installation time, eliminating environment drift as a source of irreproducibility.

      We would note, however, that PyPI distribution alone does not fully resolve the reproducibility challenges inherent to scientific Python software. Even with version-pinned dependencies, downstream changes in the Python interpreter itself, compiled extension modules, and platform-specific build toolchains can silently alter numerical behavior in ways that are difficult to anticipate or control. Our standalone installers address this by shipping a complete, fixed execution environment, and we believe this remains a meaningful architectural advantage for ensuring long-term reproducibility - particularly for non-developer users who may not be in a position to diagnose subtle environment-related failures. We see PyPI/Conda support and standalone installers as complementary rather than equivalent approaches, and will pursue both where feasible.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equalfragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.

      Our analysis in Figure 2 considers two size classes: small colonies (l < 5) and large colonies (l ≥ 5). The equal-fragment model predicts that the fracture of a large colony gives rise to two daughter fragments with half the biovolume. For an average colony of l = 25 in diameter, this corresponds to two daughter fragments with a diameter of l = 18, which is still in the large colony class. Sequential fragmentation events would be required to set a biomass transfer to the small size range (l < 5). However, the nearly exponential behavior of the fragmentation frequency function (Eq. 5) implies that subsequent fragmentation events are greatly slowed down. Therefore, the equal-fragments model predicts that the biomass transfer from large to small colonies during the first five hours of the experiment is negligible. This is in a sharp contrast with the erosion model, which transfers biomass to the small size class at every fragmentation event. The difference between the two fragmentation models is quantified in Figure 2D, with a negligible change in biomass size distribution for the equal-fragment model (horizontal dash-dotted line) and a strong increase of small colonies for the erosion model (curved dashed line). Hence, it is clear from Figure 2D that the erosion model outperforms the equal-fragment model by capturing the observed shift from large to small colonies. We have now described this more clearly in lines 231-233.

      Nevertheless, the performance of the idealized erosion model is limited at late stages (Fig. S9D). We agree with the reviewer that this limitation could potentially be overcome with the introduction of variance in cell adhesion among colonies (as we hypothesized in lines 140142). However, this is not a trivial thing to do, as it would require additional free parameters and reduce the simplicity of the model. Therefore, we chose to restrain our model to the common assumptions of idealized fragmentation models widely used in literature (e.g. references 53-55).

      Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.

      In this work we aimed to investigate the effects of shear forces over a wide range of values, extending beyond the regime of natural lakes into the strong mixing created by technological applications such as the bubble plumes that are applied in several lakes to suppress cyanobacterial blooms. The adhesion force between cells via, e.g., extracellular polysaccharides (EPS) play an essential role by controlling the resistance to shear-driven erosion, which has been quantified in our model by the fitting parameters S<sub>i</sub> and q<sub>i</sub>.

      We agree with the reviewer that we have missed some literature on Microcystis colony formation via cell aggregation (i.e., cell adhesion), for which we apologize. In our new revision, we have now included several new references [30-34,36] and we now describe the findings of these earlier studies. Specifically, in the Introduction we now pay more attention to the role of cell adhesion by writing (lines 53-60):

      “In contrast, cell aggregation (sometimes also called cell adhesion) can promote a rapid increase in colony size beyond the limit set by division rates, and may explain sudden rises in colony size in late bloom periods [26, 30, 31]. Aggregation rates depend on the stickiness of the colonies, which in turn is controlled by the EPS composition, pH, and ionic composition of water [27–29]. In particular, divalent cations such as Ca2+ can bridge negatively charged functional groups in EPS and therefore increase stickiness [32–34]. It has been shown that high levels of Ca2+ enhance cell aggregation in Microcystis cultures [35]. Moreover, cell aggregation can provide a fast defense against grazing [36]. Fluid flow plays an important role in cell aggregation by regulating the collision frequency between cells or colonies [6]. In addition, fluid flow ….”

      Furthermore, in the Conclusions we added (lines 374-376):

      “A previous study on colony aggregation at high Ca2+ levels observed similar morphological differences in colony formation [35]. There, an initial fast cell aggregation produced a sparse colony structure, followed by a more compact structure of the colonies associated with cell division”

      Finally, we would like to clarify a difference in terminology between the reviewer’s comment and our work. The term cell adhesion is commonly used in microbiology to refer to adhesion of cells with a solid substrate. In our work, the adhesion mediated by EPS occurs between free-floating cells and colonies. To avoid any confusion, we chose to refer to this process as cell aggregation, in line with other literature on suspended particles.

      Reviewer #2 (Recommendations for the authors):

      The authors have expanded on the image analysis process but now report substantially different correction factors (λ2 =2.79 compared to 73.13 in the previous submission; λ3 =0.52 compared to 13.71 in the previous submission). Could the authors comment on how the analysis changed? These correction factors for N<5 appear particularly relevant for the aggregation experiments presented in Figure 3. For measurements involving only small colonies, as in Figure 3, are these correction factors still valid? In addition, does the timing of image acquisition, i.e. when the colonies are imaged, influence the correction factors applied in this study?

      The description of the calibration process was improved in our earlier revision of the manuscript to improve clarity and remove unclear definitions. In the first version, the supplementary equation (S1) for the input variable N<sub>p</sub>[i] was defined as the number of features per frame. This variable is dependent on the frame dimension (2048x2048 px for large colonies, l>5, and 400x400 px for small colonies). We believe that a more suitable input is the concentration distribution, which is normalized by frame area, and therefore invariant to frame dimensions and less prone to misinterpretations. For this reason, we adjusted this definition of N<sub>p</sub>[i] in the revised version of the manuscript, so that it expresses the number of features per frame area (instead of per frame). These changes required the calibration constants, λ<sub>2</sub> and λ<sub>3</sub>, to be updated in the manuscript by a factor of (400 px/2048 px)<sup>2</sup>. This explains why these two calibration constants changed by a factor 0.038. This rescaling of the input variable N<sub>p</sub>[i] and the calibration constants did not affect the final results of our calculations (Figures 2 and 3).

      The authors use a moderate dissipation rate to stir the colonies, after which they allow them to sediment. How long were the particles allowed to sediment before measurements were taken? Intuitively, one might expect a greater number of colonies to be detected following sedimentation, yet the authors report only about one third of the colonies in the sedimented state. What accounts for this reduction? Furthermore, if higher shear rates are applied, do the results differ, for instance if particles are lifted further by the shear flow? Some more clarity would help other researchers to perform similar work.

      The sedimentation of particles following an initial stir was applied only for creating a reference size distribution, displayed in the supplementary Figures S8-C and D. As one intuitively would expect, a higher concentration of colonies was detected after sedimentation (Fig. S8-C and D) than during the shear flow (Fig. S8-A and B). During all other experiments in our work, the applied dissipation rate was sufficient to ensure a uniform distribution of colonies in suspension throughout the parameter range, as described in lines 461-473.

      In the caption of Figure S8 we have reported the number of colonies counted in small subsamples. These numbers are just small subsets of the total number of colonies contained in the entire volume of the cone-and-plate setup. A sub-sample with larger volume was measured during the shear flow in comparison to the sub-sample measured for the sedimented sample, leading to a larger number of counted colonies in panels A and B (N = 10776, combined) compared to panels C and D (N = 3066 and 1455, respectively).

      However, when normalized for the volume of the sub-samples, the calculated concentration of colonies is higher for panels C and D (as shown in the graphs). We understand that the earlier caption description of Figure S8 was misleading, for which we apologize. In the revised version, we have adjusted the caption to better describe the quantity:

      “Number of colonies counted during sampling …”

      Line 797 contains an unfinished edit ("Figure ADD") that should be corrected.

      The unfinished edit has been corrected in the newly revised manuscript. Thanks!

    2. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigate the role of fluid flow in shaping the colony size of a freshwater cyanobacterium Microcystis. To do so, they have created a novel assay by combining a rheometer with a bright field microscope. This allows them to exert precise shear forces on cyanobacterial cultures and field samples, and then quantify the effect of these shear forces on the colony size distribution. Shear force can affect the colony size in two ways: reducing size by fragmentation and increasing size by aggregation. They find limited aggregation at low shear rates, but high shear forces can create erosion-type fragmentation: colonies do not break in large pieces, but many small colonies are sheared off the large colonies. Overall, bacterial colonies from field samples seem to be more inert to shear than laboratory cultures, which the authors explain in terms of enhanced intercellular adhesion mediated by secreted polysaccharides.

      Strengths:

      -This study is timely, as cyanobacterial blooms are an increasing problem in freshwater lakes. They are expected to increase in frequency and severeness because of rising temperatures, and it is worthwhile learning how these blooms are formed. More generally, how physical aspects such as flow and shear influence colony formation is often overlooked, at least in part because of experimental challenges. Therefore, the method developed by the authors is useful and innovative, and I expect applications beyond the presented system here.

      -A strong feature of this paper is the highly quantitative approach, combining theory with experiments, and the combination of laboratory experiments and field samples.

      Weaknesses:

      This study has no major weaknesses. Although the initial part of the introduction seems to imply that fluid flow is the predominant factor in shaping cyanobacterial colony (de)formation, the ensuing discussion is sufficiently nuanced for the reader to understand that the multicellular lifestyle of cyanobacterium Microcystis is shaped by multiple effects, that include bacterial behavior (e.g. which and how much EPS is produced), environmental variables that control cellular aggregation or adhesion and, indeed, fluid flow.

    3. eLife Assessment

      With the goal of investigating the assembly and fragmentation of cellular aggregates, this manuscript examines cyanobacterial aggregates in a laboratory setting. This quantitative investigation of the conditions and mechanisms behind aggregation is an important contribution as it yields a basic understanding of natural processes and offers potential strategies for control. The combination of computational and experimental investigations in this manuscript provides convincing support for the role of shear on aggregation and fragmentation.

    1. eLife Assessment

      There is a growing interest in understanding the individuality of animal behaviours. In this important article, the authors build and use an impressive array of high throughput phenotyping paradigms to examine the 'stability' (consistency) of behavioural characteristics in a range of contexts and over time. The results show that certain behaviours are individualistic and persist robustly across external stimuli while others are less robust to these changing parameters. The data supporting their findings is extensive and convincing. At the same time, the main analyses focus on a selected subset of the many behavioural metrics recorded, so a large fraction of the acquired data remains only lightly explored; by making these additional data available, the authors provide an invaluable resource for future work to apply alternative analytical frameworks and further mine this rich dataset.

    2. Joint Public Review:

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Comments from the editors on the latest version:

      In the latest communication, the authors were asked to (i) justify their selection of metrics (i.e. why these specific five behavioural metrics were chosen from the many recorded), (ii) discuss the variation in ICCs, and (iii) in light of this variation and the reliance on a few selected behavioural parameters, tone down the general claim so as not to overstate that individuality persists across all behaviours.

      We note that the justification for choosing the five metrics and the discussion of ICC variation are purely qualitative, and, despite the edits, the manuscript continues to frame individual behaviours as broadly stable.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. eLife Assessment

      This valuable study advances our understanding of best practices for analyzing population-level data using advanced functional alignment methods. It provides convincing evidence that demographic-specific functional templates improve functional neuroimaging studies that use hyperalignment. This study will be of interest to cognitive neuroscientists, neuroimaging methodologists, and computational researchers with an interest in the human brain.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a compelling case for the necessity of age-specific templates in functional hyperalignment. Given that the brain undergoes substantial developmental, structural, and functional changes across the lifespan, a 'one-size-fits-all' canonical template is often insufficient. This study effectively demonstrates that incorporating age-congruent features significantly enhances the performance and sensitivity of hyperalignment models. By validating these findings across two independent datasets (Cam-CAN and DLBS), the paper provides robust evidence that accounting for age-related functional organization is a critical prerequisite for accurate functional alignment in lifespan research.

      Strengths:

      (1) The authors used three metrics to evaluate performance. Across all metrics, they found that age-congruent templates outperformed age-incongruent templates, suggesting that age-specific templates can improve alignment.

      (2) These findings highlight the superiority of age-congruent templates for hyperalignment. This work underscores the importance of age-matching in cross-subject functional mapping and represents a vital step forward for the methodology.

      Weaknesses:

      (1) Participant Demographics and Group Separation:

      The study defines the 'older' cohort as 65-90 years and the 'younger' cohort as 18-45 years. While this 20-year gap (ages 46-64) effectively maximizes the contrast between groups, the results in Figure 4a suggest that the predicted individualized connectomes follow a continuous distribution. Given this continuity, could the authors provide the average median trends for Figures 2a and 2b to illustrate how the model behaves across the missing age range?

      (2) Request for Implementation:

      I have been unable to locate the source code associated with this publication. Could the authors please provide a link to the repository or clarify if the implementation is available for reproduction?

      (3) Analysis of Prediction Performance and Distribution:

      While Figures 3b and 5b clearly demonstrate that the congruent template improves correlation, Figure 4a shows a distinct shift in the scatter distribution. Could the authors provide a detailed explanation of the prediction performance metrics used? Specifically, I would like to understand how the underlying method accounts for the distribution differences observed when applying the congruent template.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Zhang and colleagues examine the role of participant selection in creating and using functional templates to improve analyses using hyperalignment. Hyperalignment aligns participants' functional MRI data to a shared functional template, analogous to the anatomical templates used to bring anatomical MRI data into a shared space (e.g., MNI152). The question of appropriate template creation is especially pressing for population-level analyses, where a large number of demographic groups (e.g., different age ranges, clinical statuses) may be included in the same analysis. These different demographic groups may have differences in their functional organization that complicate the creation of a single study-specific functional template.

      To provide an initial investigation of the potential effect of demographic-specific templates, the authors use the publicly available Cam-CAN dataset, which contains participants from 18 to 87 years of age. They define a young adult (< 45 years of age) and an older adult group (> 65 years of age) from this dataset with approximately the same number of participants. They investigate whether "age-congruent" templates (i.e. defined in the same age group they are used) improve three analyses where hyperalignment has been previously shown to boost performance: inter-subject correlation, predicting individual connectomes, and predicting individual functional responses. Using the Cam-CAN-derived older adult template, they then replicate the ISC analyses using the publicly available Dallas Lifespan Brain Study (DLBS).

      Overall, the presented results are highly suggestive that age-congruent templates consistently improve performance, though the absolute effects are small.

      Strengths:

      The use of a separate validation sample, reusing the same template calculated with Cam-CAN, highlights the potential of developing independent templates for individual demographic groups and then distributing these for wider use, analogous to the MNI templates that are widely used throughout the field of neuroimaging. This suggests that the potential impact of this framework is significant.

      Weaknesses:

      While the authors appropriately highlight the potential applications of this result (e.g., to different clinical statuses), it is not apparent how to appropriately extend this methodology to many common experimental paradigms. For example, in case-control studies (where researchers are interested in comparing clinical and non-clinical participants) the use of two different functional templates may complicate rather than ease analyses. Providing this as a potential limitation of the current template construction method, or providing recommendations to researchers interested in comparing across groups, would help to increase the impact of this work.

    1. eLife Assessment

      This important study demonstrates that Mycobacterium tuberculosis suppresses protective Th17/IL-17 responses in C57BL/6 mice via a Tbet-dependent mechanism involving the virulence factors ESX-1 and PDIM, as mutants lacking these factors induce significantly higher IL-17-producing CD4 T cells and IL-17A in the lungs compared to wild-type bacteria. The experiments are rigorous and well-designed, combining host knockouts and bacterial mutants to yield solid evidence pointing to cross-regulation between Th1 and Th17 pathways, including reduced IL-23 in draining lymph node dendritic cells. However, some of the data on IFN-γ effects or lymph node-specific mechanisms are incomplete and require deeper mechanistic insight, such as direct T cell transcription factor analysis in lymph nodes and broader host validation, to strengthen the work. Overall, the findings provide insight into how bacterial virulence factors limit Th17 induction, thereby promoting persistence, and will interest immunologists and TB researchers focused on host-pathogen balance and vaccine strategies.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript examines the factors that restrict the induction of IL-17-producing T cells during Mycobacterium tuberculosis (Mtb) infection. The authors show that neither the infectious route nor the duration of infection is responsible. But they do show that mice that lack the Th1-defining transcription factor, a finding consistent with prior reports in the field of immunology. They also show that 2 highly attenuated Mtb mutants in ESX-1 and PDIM, two well-known Mtb virulence factors, do induce IL-17-producing T cells. In contrast, Mtb mutants in mmpl4 are also similarly attenuated, but do not induce IL-17-producing T cells, suggesting that this property is not simply a result of attenuation but due to specific properties of ESX-1 and PDIM-deficient mutants.

      Strengths:

      (1) It is interesting that mice infected with ESX-1 and PDIM mutants have increased induction of Th17 cells.

      (2) The data are solid and convincing throughout.

      Weaknesses:

      There are two main criticisms:

      (1) It is not clear how much the factors uncovered here are true beyond B6 mice. B6 mice, compared to humans, are known to be very Th1-skewed, and Tbet is a strong inhibitor of Th17-specific T cells. Many people make IL-17-producing T cells in response to Mtb infection.

      (2) Very few novel insights are mechanistically revealed about how Th17 induction is restricted by Mtb. Tbet induction is known to restrict Th17 development, and this is a T-cell intrinsic mechanism. In contrast, the IL-23 association revealed seems to be extrinsic to T cells and to act on T cells. How, if at all, are these factors related to each other in restricting Th17 induction? Also, the conclusion that it is not a result of attenuation is not completely convincing.

      Other points:

      (1) The authors show that mice infected with a deficiency in ESX-1 have more IL-17-producing CD4 T cells in response to stimulation with an ESAT-6 peptide pool (Figure 3B). Because ESAT-6 is encoded by ESX-1, why do mice infected with this Mtb mutant have any ESAT-6-specific T cells? Is it an incomplete knockdown?

      (2) The manuscript states, "Under the conditions where Th17s are highly induced, mice infected with either ΔESX-1 or PDIM lacking Mtb, the Il17a-/- mice had ~3-5 fold higher CFU than WT mice (Figures 3F-G). These results indicate that the induction of Th17s is not dependent on the attenuation of Mtb in general, but instead Mtb utilizes ESX-1 and PDIM to suppress the induction of a Th17 response that enhances protection against Mtb infection." I don't think the last sentence is necessarily true. I can imagine a scenario in which the induction of the Th17s is, in fact, due to the attenuation, and the Th17 induction still contributes to protection.

      (3) ESX-1, PDIM, and mmpl4 mutants all have similarly reduced CFUs in the lung, but what about the LN? The bacterial burden in the LN may be more important for regulating T-bet, IL-23, and Th17 differentiation, since the LN is where T cell priming occurs, than the CFU in the lung. Perhaps ESX-1 and PDIM mutants have reduced CFU in the LN, but mmpl4 does not. This difference in LN burdens may be the primary driver of Th17 priming, as high avidity interactions are thought to be an important driver of T-bet induction.

      (4) Do LN cDC1 and high levels of IL-12 p35 in mice infected with the mmpl4 mutant? Likewise, LN cDC2's express low levels of IL-12 p19 (akin to those infected with WT Mtb)? If these observations for ESX-1 and PDIM mutants are mechanistically linked to the increased numbers of Th17 cells, then you would expect mice infected with mmpl4 mutants to be more like those infected with WT Mtb than those infected with ESX-1 and PDIM mutants.

      (5) ESX-1 and PDIM are very different virulence factors - a protein secretory pathway and cell wall lipid, respectively? Mechanistically, how would mutants in these pathways give very similar outcomes regarding Th17 cells unless it was simply as an aspect of their attenuation? Perhaps, mmpl4 mutants simply differ in some aspects of their attenuation, such as bacterial burdens in LNs, or their interaction with cDCs?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors tackle an important question of why IL-17 production and TH17 responses are lower than expected during Mtb infection. The authors identify an axis of cross-regulation between TH1 and TH17 cells and provide data to support roles for Mtb virulence factors ESX1 and PDIM in promoting TH1 responses and/or suppressing TH17 responses.

      Strengths:

      The strengths include the significance of the work, the combination of host and Mtb genetic models to dissect the mechanistic basis for regulation of IL-17 production from T cells during infection, and the rigor of the experiments. There are a number of exciting findings from the work, including the cross-talk between T cell responses and the impact of ESX1 and PDIM on these responses.

      Weaknesses:

      The following conclusions and interpretations should be revisited, rephrased, and re-evaluated:

      (1) The manuscript neglects to analyze T cell responses in the dLN, which is the critical site where these responses are initiated (only DC cytokine production is measured in the dLN). The differences in the lungs could reflect trafficking of T cells to the lungs, local lung T cell responses, or durability of the T cell responses in the lungs. The authors state in the last results section that "These results indicate that the ESX-1 and PDIM virulence factors impact naïve T cell differentiation at the draining mediastinal lymph node..." but T cell responses are never measured in the dLN.

      (2) Figure 2: The authors state that "Importantly, IFN-γ deficient mice did not exhibit elevated levels of IL-17A producing CD4 T cells demonstrating that IFN-γ production is not the mechanism by which Th1 T cells limit a Th17 response during Mtb infection", but the difference is significantly different and even more obvious in Panel B. In fact, if the Panel D y-axis was on a log scale, the Ifng-/- would likely look more like Tbet-/- than WT. Based on this data, it seems like IFNg is having an effect and should not be completely discounted. Does the deletion of Ifng affect the number of Tbet+ T cells?

      In addition, the deletion of Tbet results in an increased number of IFNg+IL-17+ double positive T cells (Figure 2B), in addition to a sizable IFNg single positive T cell population maintained in the Tbet-/- mice (10x the negative control of Ifng-/-). Is this why Tbet deletion is not as severe as Ifng deletion, because T cells are still making IFNg?

      Along these lines, the statement in the text that, "Tbet-/-Il17a-/- mice completely lacked both IFN-γ producing...." T cells is not supported by the data in Figure 2C. Tbet-/-Il17a-/- mice look to have more gamma-producing T cells than Tbet-/- mice (which is already 10x the negative control of Ifng-/- in panel 2B if one includes the gamma single positive and IFNg/IL-17 double positive).

      (3) In the Results sections describing Figures 3, 4, and 5, the authors equate IL-17 production by T cells with TH17 responses and IFNg expression with TH1, but Tbet and RORgt expression in the T cells should be measured to make conclusions about TH1 and TH17. Or the authors can rephrase their findings to specifically state the observations as IFNg or IL-17 expressing CD4+ T cells.

      (4) Conceptually, do the authors think that ESX1/PDIM promotes TH1 responses and this blocks TH17 or are ESX1/PDIM blocking TH17 responses directly, allowing for increased TH1 responses? It would be helpful to clarify the model in this regard, describe how the data supports one model or the other, and then make sure the language is consistent throughout. Can these effects on T cell responses be tested and recapitulated in vitro using infected APC and T cell co-cultures?

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Zilinskas et al seeks to understand the mechanisms underlying the ability of Mtb to suppress Th17 differentiation. As Th17 responses are needed for protective immunity against TB, this is an important topic of investigation. They use Mtb mutants that lack eccC1 (from the ESX-1 locus) and fadD28 (encoding PDIM) and implicate a Tbet-dependent pathway by which Mtb modulates Th17 differentiation. The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is, however, unclear, which limits the novelty of the results.

      Strengths:

      Understanding how Mtb limits Th17 differentiation has implications for vaccine development. Comparative study of KO mice and Mtb mutants is a strength.

      Weaknesses:

      (1) The authors should acknowledge and reference key findings from the literature that have identified suppression of Th17 differentiation as an Mtb virulence mechanism, e.g., the role of the Hip1 protease and CD40 signaling (Madan-Lala JI 2014, Sia Plos Path 2017, Enriquez iScience 2022) and Khader JI 2005, showing the requirement of IL-23 for Th17 responses in vivo in a TB mouse model.

      (2) Addressing several questions related to the Tbet KO mouse experiments would strengthen the study. Do the Tbet KO mice have elevated IL-4/5/13 (which has been previously reported in non-TB studies) in addition to IL-17? The lack of Th17 cells in the IFNg KO compared to the Tbet KO may be due to a difference in timing, since only 3-week data are shown; earlier and later time points would provide better interpretation. The authors do not present any data on neutrophil infiltration in WT vs Tbet KO vs IFNg KO mice. Since IL-17 is known to be important for recruiting neutrophils to the lung, data on neutrophils are important for clarifying the mechanism for the CFU outcomes.

      (3) While IL-23 is important for sustaining IL-17 production, IL-6, TGF-b and/or IL-1β are necessary for Th17 polarization. What were the levels of these cytokines in DCs in the lung? (Figure 5). Additionally, Tbet-deficient DCs exhibit impaired activation of antigen-specific Th1 cells and have reduced IL-12 production. Given the data showing higher IL-17 levels in Tbet KO mice, the authors should provide information on the DC phenotype (IL-23, IL-6, etc.) in the Tbet KO experiments.

      (4) The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is not clear. While data showing a role for ESX-1 and PDIMs in inhibiting Th17 responses is interesting, there is no insight into the potential mechanism of action. Figure 3 showing reduction in IFNg+ CD4 T cells after infection with eccC1 and fadD28 mutants suggests that this outcome is due to a lower bacterial load relative to WT Mtb at the 3-week time point. Since IFNg is known to suppress IL-17, the higher levels of Th17 cells could be due to the reduction in IFNg due to the attenuated growth of the mutants. Additionally, what was the level of Type I IFNs elicited by these mutants?

      (5) Since macrophages have been implicated in the reduced cytokines seen in the ESX-1 mutant, IL-23 and other cytokine data on lung macrophages would complement the DC data.

      (6) Figure 5. There are many fewer DCs overall in the eccC1 and fadD28 mutant groups, which could account for the increased % IL-23p19 in DCs (5D). What were the levels of IL-23 in DC1s?

    1. eLife Assessment

      This valuable manuscript demonstrates that embryonic exposure to the pesticide chlorpyrifos (CPF) impairs juvenile zebrafish social behavior and sets out to define the underlying mechanism. The authors provide solid evidence that butyrate and class I histone deacetylases are involved, as their modulation rescues the phenotype. However, claims that CPF acts through the microbiome and nitric oxide signaling remain correlative and incomplete. Additional validation would strengthen the intriguing hypotheses raised by this work.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine the effect of Chlorpyrifos (CPF) exposure on zebrafish social development. They expose larval zebrafish to CPF (0 - 3 dpf), and report social deficits at juvenile stages. They show that the gut microbial metabolite butyrate can rescue these social deficits, proposing that butyrate acts as a histone deacetylase (HDAC) inhibitor, given that inhibition of some HDACs can also rescue social deficits. They also show that CPF changes neuronal gene expression, and butyrate partially rescues these changes. Finally, they demonstrate changes in gut microbiome and metabolome composition, pointing to potential modulation of nitrogen metabolism pathways. They then hypothesise that NO can modulate HDAC activity and attempt to link the NO pathway to social behavior.

      Strengths:

      The authors demonstrate an interesting link between early Chlorpyrifos (CPF) exposure and later-life social deficits, such as changes in neuronal gene expression, including some autism-related genes, and provide solid evidence that butyrate and epigenetic modulation (histone deacetylase inhibition) may be involved.

      They also comprehensively characterise the microbiome and metabolome of CPF-exposed zebrafish, providing a useful resource for further investigation into its gut-brain mechanisms.

      They are cautious in framing some of their conclusions as a hypothesis and provide some suggestions for future analyses.

      Weaknesses:

      The claim that butyrate's effects on CPF-induced social deficits and neuron activity changes are mediated by histone deacetylase inhibition is lacking some additional controls and, hence, is not completely supported.

      Details on the social behavior assay performed and other potential morphological or behavioral changes were not provided.

      Claims on the mechanism of action of CPF are inconclusive. The causal role of the gut microbiome is not established, especially since gut microbial dysbiosis may also be a downstream consequence of direct effects of CPF on the host, such as changes in host gut gene expression. Evidence for the role of nitrogen metabolism is also incomplete, and the authors have not discussed or ruled out the potential alternative mechanism of reduced butyrate production due to gut microbiome changes.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Diaz et al. uses the zebrafish model to examine how early embryonic exposure to Chlorpyrifos (CPF), a widely used organophosphate pesticide, induces social behavior deficits later in life. This paper combined behavioral testing, pharmaceutical treatment, genetic manipulation, and multi-omics to test the hypothesis that early CPF increases the abundance of denitrifying bacteria, Pseudomonas, which, in turn, enhances nitric oxide production and induces selective inhibition of HDAC8 and abnormal gene expression in the brain.

      Strengths:

      (1) The observation that early embryonic CPF exposure causes behavior deficits in juvenile zebrafish is very intriguing. It is especially exciting to see that CPF-induced behavior deficits can be reversed by overnight treatment with butyrate or HDAC1 inhibitors in juvenile zebrafish. In humans, CPF exposure during pregnancy causes brain abnormalities and neurological disorders such as Autism. Though it is far away from the zebrafish experimental study to human application, the experimental effects reported in the paper are still quite thought-provoking.

      (2) The authors performed RNA sequencing experiments on control zebrafish, CPF-exposed zebrafish, and CPF-exposed zebrafish that were treated with Butyrate. The data not only showed large-scale transcriptomic changes in the juvenile zebrafish brain in response to embryonic CPF exposure but also showed that many CPF-induced genetic alterations can be alleviated by butyrate exposure later in life.

      (3) The authors also performed untargeted metabolomics on zebrafish gut and metagenomic analysis in zebrafish feces samples. The results are interesting and support the conclusion that increased Intestinal Nitric oxide metabolism and the abundance of denitrifying bacteria, such as Pseudomonas, are associated with CPF exposure.

      (4) The large datasets presented in the paper will be useful to other researchers interested in understanding how CPF or butyrate alters brain and gut function. It might be useful to generate new hypotheses to power other research lines.

      (5) The social preferences, behavior testing, and experimental paradigm used by the paper may also be used by other researchers to investigate the interaction among gene, environmental factors, and brain function.

      Weaknesses:

      (1) The presented link between gut microbiome and CPF-induced behavior and genetic alteration is an association, but not causation. Although the research data align with the hypothesis, the hypothesis is not fully supported or tested by the data presented in the paper in the current state.

      (2) The authors performed several large omic studies. However, some of the presented analyses are relatively simple and incomplete. For example, the authors performed shotgun metagenomic analysis on zebrafish feces. However, the paper only displayed the bacterial taxa differences. Are there any differences in bacterial genetic pathways, especially the pathways associated with microbial nitrogen metabolism? What is the alpha and beta diversity looking like when comparing different experimental groups?

    1. eLife Assessment

      This work provides an important modeling-based framework for understanding the processes of temporal integration in the claustrum. These mechanisms could support a broader range of integrative brain function. However, at present, the evidence remains at least in part incomplete, primarily because of over-interpretation of the results and their connection to neurophysiology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate how the anterior claustrum may integrate temporally separated task-relevant signals to guide behavior in a delayed escape paradigm. Because in vivo neural recordings from claustrum during this task are extremely limited - comprising single-trial data with small neuronal samples - the authors adopt a modeling-driven approach. They train recurrent neural networks (RNNs) using only behavioral data (escape latency) to reproduce task performance and then analyze the internal dynamics of the trained networks. Within these networks, they identify a subset of units whose activity exhibits persistent responses and strong correlations with behavior, which the authors label as "claustrum-like." Using dimensionality reduction, decoding, and information-theoretic analyses, they argue that these units dynamically integrate conditioned stimulus (CS) and door-opening signals via nonlinear, trajectory-based population dynamics rather than fixed-point attractor states.

      To bridge model predictions and biology, the authors complement the modeling with in vitro slice experiments demonstrating recurrent excitatory connectivity and prolonged activity in the anterior claustrum that depends on glutamatergic transmission. They further compare latent neural trajectories derived from previously published in vivo claustrum recordings to those observed in the RNN, reporting qualitative similarities. Based on these results, the authors propose that the claustrum implements temporal signal integration through recurrent excitatory circuitry and dynamic population trajectories, potentially supporting broader theories of integrative brain function.

      Strengths:

      This study addresses an important and challenging problem: how to infer population-level computation in a brain structure for which in vivo data are sparse and experimentally constrained. The authors are commendably transparent about these limitations and seek to overcome them through a principled modeling framework. The integration of behavioral modeling, RNN analysis, and slice electrophysiology is ambitious and technically sophisticated.

      Several aspects stand out as strengths. First, the behavioral RNN is carefully trained and interrogated using a rich set of modern analytical tools, including cross-temporal decoding, trajectory analysis, and partial information decomposition, providing multiple complementary views of network dynamics. Second, the slice experiments convincingly demonstrate recurrent excitatory connectivity in the anterior claustrum, lending biological plausibility to the model's reliance on recurrent dynamics. Third, the manuscript is clearly written, logically organized, and conceptually engaging, and it offers a coherent mechanistic hypothesis that could guide future large-scale recording experiments.

      Importantly, the work has significant heuristic value: rather than merely fitting data, it attempts to generate testable computational ideas about claustral function in a regime where direct empirical access is currently limited.

      Weaknesses:

      Despite these strengths, the manuscript suffers from a recurring and substantial conceptual issue: systematic over-interpretation of model-data correspondence. While the modeling results are potentially insightful, the extent to which they are presented as recapitulating real claustral neural mechanisms goes beyond what the available data can support.

      A fundamental limitation is that the RNN is trained solely on behavioral output, without being constrained by neural data at either single-unit or population levels. As a result, the internal network dynamics are underdetermined and non-unique. Many distinct internal solutions could plausibly generate identical behavior. However, the manuscript frequently treats the specific internal solution discovered in the RNN as if it were a close approximation of the actual claustrum circuit.

      This issue is compounded by the sparse nature of the in vivo data used for comparison. The GPFA-based trajectory analyses rely on pseudo-populations and single-trial recordings, yet are interpreted as evidence for robust population-level dynamics. Because neurons were not recorded simultaneously, the inferred trajectories necessarily lack true population covariance and shared trial-to-trial variability, limiting their interpretability as genuine population dynamics. Similarly, conclusions about trajectory-based versus attractor-based computation are drawn almost exclusively from model analyses and then generalized to the biological system.

      Overall, while the modeling framework is appropriate as a hypothesis-generating tool, the manuscript repeatedly crosses the line from proposing plausible mechanisms to asserting explanatory or even causal equivalence between the model and the brain. This undermines the otherwise strong contributions of the work.

      Below are several specific points that warrant further clarification or revision:

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      Overall Assessment:

      This manuscript presents an interesting and potentially valuable modeling-based framework for thinking about temporal integration in the claustrum, supported by solid slice physiology. However, in its current form, it overstates the degree to which the proposed RNN dynamics reflect actual claustral neural mechanisms. With substantial revision - especially a more cautious interpretation of model-data similarity and a clearer articulation of modeling limitations - the study could make a meaningful contribution as a hypothesis-generating work rather than a definitive mechanistic account.

    3. Reviewer #2 (Public review):

      This manuscript reports the behavior of a computational model of rat claustral neurons during the performance of a behavioral task known as the delayed escape task (in this reviewer's understanding, this behavioral task was created and implemented by this group only). These authors have argued in a prior manuscript (Han et al.) that a group of neurons located "rostral to striatum" is part of the claustrum. The group names the region the "rostral to striatum claustrum." Additionally, in the Han et al. paper, the authors argue that these cells are responsible for maintaining a signal that lasts through the delay period.

      The main findings of the current paper are:

      (1) The authors have built a model network that was trained to show firing similar to what was reported for rats in their prior paper.

      (2) The authors' analysis of model behavior is used to suggest that the model network recapitulates biological activity, including the existence of a cluster of cells mainly responsible for the delay period firing.

      (3) The authors offer evidence from patch clamp recordings for excitatory interconnections among claustral neurons that are an essential feature of the model network.

      A major value of the computational network is that "trials" of the network can be performed. In experiments on animals, only single trials can be used.

      Concerns:

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Our goal was to propose a possible computational mechanism underlying information integration in the claustrum, not to claim structural or causal equivalence between the model and the biological circuit. We acknowledge that some expressions in the original manuscript may have been interpreted as exceeding this intention, and we will revise the text to explicitly soften such statements.

      It is well established that behavior-trained RNNs can admit multiple internal solutions capable of producing the same behavioral output, and we fully agree with this point. Among the many possible solutions, we focused on networks that exhibited dynamical properties consistent with independently obtained behavioral and physiological findings. Thus, in our view, biological plausibility in this study is not grounded in structural isomorphism, but rather in whether the core population-level dynamical properties observed in the model are reproducible in actual claustral population activity.

      We also agree with the reviewer that our original qualitative comparison of GPFA-based latent trajectories did not provide sufficient quantitative support. In the revised manuscript, we have therefore added an eigenvalue-based quantitative analysis of the dimensional structure of population trajectories. This analysis does not depend on the identity of the dimensionality-reduction method itself, but instead focuses on quantifying the geometric structure of population-state trajectories as they evolve over time. Applying the same metric to both the RNN and biological claustrum data revealed consistent condition-specific differences in population dynamics.

      This quantitative addition strengthens the previous qualitative trajectory comparison and clarifies that the model implements a specific computational dynamical regime that directionally corresponds to claustral population activity. While this does not imply uniqueness of the model, we believe it suggests that the proposed computational principle represents a biologically realizable candidate mechanism.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      We agree with the reviewer’s concern. Expressions such as “closely mimicked,” “nearly identical,” and “recapitulate” will be replaced with more moderate language.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer noted, behavior-trained RNNs can yield multiple internal solutions that generate the same behavioral output, and we acknowledge this non-uniqueness. However, we do not interpret the relatively low success rate (5/100 networks) as evidence of fragility. Rather, we interpret it as suggesting that the emergence of this particular dynamical regime requires stringent structural constraints.

      The computational demands of the task—specifically, the integration of temporally separated signals—drive convergence toward networks capable of sustaining persistent activity through recurrent excitatory connectivity. Indeed, all networks exhibiting a claustrum-like cluster shared a strong recurrent excitatory structure within Cluster 1, a structural feature consistent with our slice electrophysiology findings.

      Our criterion for selecting RNNs was their ability to reproduce behavioral and physiological observations from the delayed escape experiment. Excluded RNNs may reflect alternative information-processing strategies characteristic of other brain regions or artificial logical solutions. Importantly, claustrum-like dynamics were not explicitly enforced during training; they emerged spontaneously under behavioral constraints, suggesting that this solution is not arbitrary.

      Furthermore, the computational principles derived from the RNN were quantitatively consistent with in vivo single-neuron activity. Using an eigenvalue-based metric (λ<sub>3</sub>/Σλ), both the RNN and biological claustrum data showed effects in the same direction. Leave-one-neuron-out analyses further demonstrated that this pattern was broadly distributed across neurons in the claustrum. These convergent results suggest that the identified network captures a computational regime that is consistent with claustral population dynamics, rather than representing an arbitrary solution unrelated to the biological observations.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      We agree that the original GPFA trajectory comparison in the biological claustrum data remained qualitative and did not sufficiently establish robustness or population-level structure. We have therefore added quantitative analyses in the revised manuscript.

      Before presenting these analyses, we clarify methodological limitations inherent in pseudopopulation and single-trial data. GPFA estimates latent trajectories based on covariance structure and temporal smoothness assumptions. In pseudopopulations, true simultaneously recorded covariance cannot be fully reconstructed. Although our dataset is based on single trials rather than trial-to-trial variability, we acknowledge that latent-space estimation depends on covariance structure.

      Therefore, the additional quantitative metric is not independent of the GPFA estimation stage; rather, it evaluates the geometric structure of single-trial latent trajectories estimated by GPFA.

      Specifically, for biological data, we reanalyzed GPFA-estimated latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). Across 20 time bins, a sliding window of 10 bins was applied. For each window, we computed the covariance matrix and extracted eigenvalues for PC1, PC2, and PC3. The third eigenvalue (λ<sub>3</sub>) was normalized by total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the extent to which trajectories deviate from a planar (two-dimensional) structure into a third dimension. An increase in λ<sub>3</sub>/Σλ indicates the formation of a higher-dimensional geometric structure.

      For RNN data, since all unit activities were simultaneously observed and sufficient trials were available, we directly applied PCA to population activity without GPFA. Mean trajectories across trials were computed, and the same λ<sub>3</sub>/Σλ metric was applied. Although the initial dimensionality-reduction steps differ, the final metric definition and computation are identical. Thus, the comparison focuses on geometric dimensional structure rather than the dimensionality-reduction method itself.

      Importantly, within the biological dataset, GPFA estimation, preprocessing, pseudopopulation construction, subsampling strategy, temporal alignment, and smoothing were applied identically across the CS and Neutral conditions. Under this common analysis framework, λ<sub>3</sub>/Σλ values were consistently higher in the CS condition than in the Neutral condition.

      For the RNN data, an identical analysis pipeline was applied across the CS+Open and Open-only conditions. In this case as well, λ<sub>3</sub>/Σλ values were significantly higher in the CS+Open condition than in the Open-only condition.

      If structural bias arose from covariance estimation or dimensionality reduction, it would be expected to affect conditions similarly within each dataset. The observation that λ<sub>3</sub>/Σλ increases selectively in the CS condition in biological data and in the CS+Open condition in the RNN therefore supports the interpretation that the effect reflects a condition-specific dynamical difference rather than an artifact of dimensionality reduction.

      To further examine whether the effect was driven by a small subset of neurons, we performed leave-one-neuron-out analyses in the biological dataset. In the CS group, most neurons contributed relatively evenly to the metric, whereas such distributed contribution was not observed in the Neutral group. This suggests that the three-dimensional structure reflects an organized population-level phenomenon rather than covariance dominated by a small number of outlier neurons.

      These results indicate that the consistent elevation of λ<sub>3</sub>/Σλ in the CS condition (biological data) and in the CS+Open condition (RNN) reflects a genuine dynamical feature rather than an artifact arising from pseudopopulation construction or dimensionality reduction.

      Taken together, the three-dimensional geometric structure observed in GPFA-based latent trajectories is unlikely to reflect random noise. The replication of the same quantitative metric in the RNN, using an independent dimensionality-reduction procedure, strengthens the correspondence between the two systems. We appreciate the reviewer’s suggestion for quantitative reinforcement, which has substantially strengthened the manuscript.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and will clearly indicate that references to broader theoretical interpretations are speculative. We will substantially reduce their strength and emphasis.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      We agree with the reviewer’s concern. We will describe the delayed escape task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals” and remove inference-related terminology throughout the manuscript.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s constructive and well-balanced comments. We regret that some of our wording and the scope of our introduction and discussion may not have appropriately reflected the contributions of prior studies. We will revise the manuscript accordingly to ensure that previous literature is more accurately and fairly acknowledged. In addition, we will reorganize the figures to more clearly present the hypotheses being tested and will provide additional details regarding both the modeling framework and the experimental procedures.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We will clarify more explicitly which data and methods originate from Han et al. (2024). In the original manuscript, Figure 1 panels A, D, E, F, and L (left) were indicated in the legend as originating from Han et al. (2024). We will further clarify this distinction in the main text. Additionally, we will briefly describe the behavioral experiments and in vivo electrophysiology performed in Han et al. in the Methods section, with appropriate citation.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      As requested, we will provide additional details regarding model training procedures, weight matrices and their evolution during training, equations (2) and (3), the origin of constants used in the equations, and detailed methods for ChrimsonR injection (anesthesia, stereotaxic coordinates, injection parameters, and clarification of “sparse expression”).

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We will reorganize the figures to emphasize core results and clarify that the primary goal is to test and validate the computational model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We will cite Orman (2015) as suggested and note that persistent activity has been observed in slices cut at specific angles, consistent with our findings.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      We will remove wording implying “limited” prior work and appropriately acknowledge contributions from the Mathur and Citri groups.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      Across all whole-cell recordings, optogenetic responses were observed in 38 out of 43 patched cells (~90%), suggesting that a high proportion of claustral neurons receive intra-claustral excitatory input. However, precise connectivity frequency and strength cannot be determined from the current dataset.

      As the reviewer noted, our RNN is specialized for the delayed escape task, and we do not claim direct generalization to other proposed claustral functions such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism observed in this specific task.

      While our model is specific to the delayed escape task, the computational principle identified here—nonlinear trajectory-based temporal integration supported by recurrent excitatory connectivity—may represent a more general mechanism for integrating temporally separated signals. However, testing such generality lies beyond the scope of the present study and will be framed as a future direction in the revised Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

      We thank Reviewer 1 for their kind and constructive comments. While we have thoroughly addressed all specific recommendations below, in brief, we have added new analysis of the variance inflation factor in Supplementary Tables 2 and 3 to reassure readers that the chosen parameter sets exhibit low levels of collinearity, and provided more explanation for why the relative positional parameters were chosen to avoid this issue. We have added explanatory figures for all positional and orientational parameters to improve understanding of the technical details, and improved clarity of existing figures as detailed below. We welcome the suggestion to add QT interval to the manuscript – whilst this was only available in the UK Biobank for a single lead, we have included an analysis of both QT and QTc intervals in this lead to Page 10, and added some discussion of this to the second full paragraph of Page 14.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: “Collinearity and Regression Analysis: It would be valuable to assess the collinearity among the regressed parameters (e.g., cardiac size, torso volume, heart center positions [x, y, z], and cardiac orientation angles) and evaluate whether alternative regression methods (e.g., ridge regression) might improve robustness. Additionally, cardiac digital twinning with electrophysiological models could help isolate the exact contribution of electrophysiology while enabling sensitivity analysis. Nonlinear regression or machine learning approaches might also enhance the predictive power of the analysis.”

      We thank the reviewer for drawing attention to the important issue of collinearity in the parameter sets used in the regression analysis. To address this, we have added Supplementary Tables 2 and 3, which detail the variance inflation factors for each of the parameter sets used. This was considered in the selection of anatomical parameters – e.g. using relative position not absolute distances between landmarks, which would be more collinear. As these are all below a value of 3.4, we believe that the effect of collinearity is limited, and thus to reduce subjectivity of parameter selection in more complex methods, and encourage interpretability, we have retained our linear regression analysis. In addition, we have added an explanation to the second full paragraph on Page 6 of how we calculated the relative, rather than absolute position of the cardiac centre partially to avoid the problem of collinearity when using multiple absolute distances. We concur that modelling and simulation techniques are well suited to explore the electrophysiological component further – as this is out of the scope of this work, we have addressed the role of these methods in future work in the final paragraph of Page 16.

      Comment 2: “Figure Clarity (Bar Plots): The superimposed bar plots in Figures 2-4 are difficult to interpret; separating the bars for each coefficient would improve readability.”

      We accept that the stacked bar plots could be improved in their clarity. Whilst plotting each anatomical parameter separately multiplies the number of plots by a factor of nine, and makes comparison between parameters more difficult, we have added clear horizontal grid lines in order to make values easier to read and interpret.

      Comment 3: “Feature Extraction Visualization: A schematic figure illustrating the steps for measuring heart positional parameters (e.g., with example annotations) would help readers better understand the feature extraction methodology.”

      We agree with the reviewer that the calculation of positional and orientational parameters is crucial to illustrate clearly. We have included additional Supplementary Figures 2 and 3 to better convey these parameters.

      Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 postMI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is stateof-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

      We thank Reviewer 2 for their considered and detailed feedback. We greatly appreciate the invitation to elaborate on the electrophysiological factors, and we have added discussion of this matter to the second and third full paragraphs on Page 14, extending to Page 15 and first full paragraph on Page 15, and highlighted the role of modelling and simulation in future work on the third full paragraph of Page 16. We agree that registration errors are one reason behind remaining reconstruction errors and feel a strength of our study is that the large number of subjects used aided in reducing the effect of this noise, and have updated the second full paragraph of Page 16 to reflect this. We are wary of moving too many supplemental figures and tables describing demographic trends to the main manuscript for fear of diluting the specific answers to our research questions. We have however actioned the suggestions as detailed below to reformat the paper, including redressing the balance of supplemental versus main methodological sections, and thank the reviewer for their guidance in increasing our clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Please detail what "chosen to be representative of the underlying dataset" means in terms of a validation dataset.

      We thank the reviewer for addressing the lack of clarity in this matter. We have added a reference in the third full paragraph on Page 6 to Supplementary Appendix 1.1, where we have included full details of the selection criteria.

      (2) “Current guidelines ... further research [16]." The paragraph should begin with a broader statement that is relevant to the fact that the entire body of work focuses on ECG-based diagnosis differences in women, rather than LVEF through echocardiography.

      We have revised the introduction to Paragraph 3 on Page 3 to clarify our motivation for focusing on the ECG in order to shape proposals for novel ECG-based risk stratification tools.

      (3) The last paragraph of the introduction should more clearly state what was performed and how you aim to prove your hypothesis. There is no mention of the data, the regression model, or other key aspects important to the reader.

      We have added methodological details to Paragraph 5 on Page 3 in order to clarify our approach in testing our hypothesis.

      (4) An overview paragraph should be included in the Methods at the beginning.

      We thank the reviewer for this valuable suggestion – we have added an overview paragraph to the start of the methodology section on Page 5.

      (5) The computational pipeline portion of the methods should be written in full paragraphs instead of almost a bulleted list. In general, more details from the supplement should be provided in the methods.

      We thank the reviewer for raising important points concerning the balance of methodological description in the main manuscript and the supplementary materials. We have added detailed description of the reconstruction pipeline to Pages 5 and 6. We feel that the ordered format of the methods section adds to the reproducibility and transparency of our methodology.

      (6) The torso reconstruction method was already validated in Smith et al. [29]. What value does your additional validation bring to this methodology? Furthermore, how does the construction of the ventricular-torso reconstructions using the cardiac axes (not just the torso contours) influence ECG metrics?

      We apologise that this was not clear – we have clarified in Paragraph 4 on Page 5 that while Smith et al. 2022 provided a detailed validation to the contour extraction networks, it did not validate the torso reconstruction pipeline, as it only presents the reconstruction of two cases as a proof of concept. We have also expanded the second full paragraph on Page 6 to explain that the sparse (but not dense) cardiac anatomies were constructed in order to calculate the cardiac size, which we found was a key factor moderating many ECG biomarkers. We also specified that the cardiac position and orientation were necessary in order to relate these to the torso axes and positions of the ECG electrodes.

      (7) Include the details of the regression analysis in the main body of the methods for the readers. This is crucial to the claims and outcomes of the paper. Only a sentence is included in the results and one in the figure: "Each factor's contribution is calculated from the product of the regression coefficients and anatomical sex differences (Supplementary Appendix 1.5)." What specific contributions can I expect to see in the results figures? The results are filled with methodological aspects that should be in the results.

      We thank the reviewer again for this important comment regarding the balance of the main text methodology and supplementary methodology sections. We have added detail to the statistical analysis section of the main text on Pages 7 and 8 in order for the reader to understand the following results section without consulting the supplemental methods. We have also removed these details from the results section.

      (8) What is "the remaining estimated effect of electrophysiology". Did you do simulations on the electrophysiology, or how is this computed from the clinical data of patients? More explanation is needed, as without this, the paper is just focusing on anatomy.

      We have clarified this important point by moving the explanation of the methodology underpinning our estimation of the electrophysiological contributions using the clinical ECGs from the supplementary methods to the main manuscript on the second full paragraph on Page 7, and continuing to Page 8. We have also specified the role of simulations studies in future work on the final paragraph on Page 16.

      (9) Include an overview paragraph of the methods to create more structure.

      We thank the reviewer again for the further attention to this issue – as previously, we have added an overview paragraph to the methodology section on Page 5.

      (10) Only 19.8% of the patients were female, which is probably due to females having a more severe presentation of the disease. How does this impact, bias, or skew your results?

      This comment raises a very interesting point, and while the origin of this imbalance is of course multifactorial – women likely do have lower rates of MI events due to the cardioprotective role of estrogen and different health promoting behaviours, and our sex imbalance was reflective of wider trends in MI diagnosis. However, as mentioned in Paragraph 2 Page 3 of the text, there are more missed MI diagnoses in women, and we agree that this may lead to a more severe presentation of female MI pathophysiology. We have expanded the first full paragraph on Page 16 to specify the ECG and demographic impacts that this has on our results, and that it is a strength of this work that we may contribute to future adjustment of the diagnostic criteria, such that future investigations do not have this bias, and that clinical outcomes are improved.

      (11) A lot of extra information is provided in Tables 1 and 2. Include additional information in the supplements that is not directly relevant to your findings.

      We agree that Table 2 is supplementary, rather than critical information, and have moved it accordingly to the Supplementary Materials on Page 38. We do believe that Table 1 is central for understanding the extracted dataset.

      (12) Combine paragraphs 3 and 4 into a single paragraph. "Current guidelines..." and "T wave amplitude...". They are part of a single coherent concept.

      We have removed the paragraph break on Page 3 Paragraph 3.

      (13) Check all acronyms throughout the paper. The abbreviation for sudden cardiac death (SCD) is only used once in the same paragraph. Remove the acronym and type it out. T-wave amplitude (TWA) is introduced twice in a Figure caption and not introduced until the methods.

      Many thanks for this suggestion – we have reviewed all acronyms in the manuscript.

      (14) "Figure 1B showcases the capability of the computational pipeline to extract torso contours and reconstruct them into 3D meshes". Isn't this Figure 1A?

      We apologise that this was unclear, and have updated the sentence on the first full paragraph of Page 8 to clarify the purpose of Figure 1B.

      (15) No need to state: "Female y-axis limits have been adjusted by the difference in healthy QRS duration between sexes for ease of comparison" in the Figure 2 caption.

      We have removed this statement on all relevant captions.

      (16) The paragraph "For lead V6, 15.9% of healthy subjects..." can be combined with the previous section.

      We have removed this paragraph break on Page 9 to improve readability.

      (17) The only demographics I could find were age and BMI. State which demographics you used explicitly. This is especially true when the discussion makes claims like "Our findings suggest that corrected QRS duration taking into consideration demographics...". How did you take them into account?

      We accept that our previous description of the demographic adjustment to QRS duration in the discussion did not adequately reflect the comprehensiveness of our approach, and have adjusted the second paragraph on Page 14 to rectify this.

      (18) The results section is also almost a bulleted list that should be written and reformatted into paragraphs.

      The ordered style of our results section was designed to compare how our obtained data answers our research question differently for ECG intervals, amplitudes, and axis angles. Whilst we have adjusted paragraph breaks and moved methodological details to more appropriate sections, we have retained this stylistic choice.

      (19) The following sentence should be in the introduction: "Alterations to the polarity and amplitude of the T wave are used in the diagnosis of acute MI [42] and TWA affects proposed risk stratification tools, particularly markers of repolarization abnormalities [9, 43]."

      We thank the reviewer for this suggestion. We have included the discussion of how TWA is separately used in proposed risk stratification and current diagnostic tools in Paragraph 3 of Page 3.

    2. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. The study then deploys a linear regression model to relate the level of influence of various factors to ECG-based changes. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent linear regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      A major weakness is that a linear additive model may not adequately capture how anatomy and electrophysiology interact. Myocardial infarction dramatically alters both anatomy and electrophysiology in ways that are not easily separable and could be considered non-linear. As such, the electrophysiological factors in the model may still include factors that have an anatomical basis (i.e. the formation of scar) that were not accounted for during model generation. However, the technique remains useful for dissecting large factors beyond anatomy, as demonstrated in this study.

    3. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      • The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals). • The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances). • Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics. • Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      • The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

    4. eLife Assessment

      This important study combines electrocardiographic (ECG) and heart/torso anatomy data from subjects included in the UK Biobank to analyze sex-specific differences in relationships between those two characteristics. The study has several compelling strengths, including the development of an open-source pipeline for reconstruction and analysis of heart/torso geometry from a large cohort. Nevertheless, technical analysis of the data as presented is incomplete, specifically as it pertains to assessment of co-linearity between regressed parameters, interpretation of regression coefficients for sex and/or presence of myocardial infarction, and discussion of potential roles played by underlying electrophysiological derangements. With improvements to these aspects of the analysis, the paper would be of interest to the cardiovascular research community, especially those studying highly relevant health and treatment disparities arising from sex differences.

    5. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

    6. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

    1. eLife Assessment

      This fundamental work shows that a history of cocaine self-administration disrupts the orbitofrontal cortex's ability to encode similarities between distinct sensory stimuli that possess identical task information - hidden states. The evidence supporting these conclusions is compelling, with methods and analyses spanning self-administration, a novel 'figure 8' sequential odor task, recordings from 3,881 single units, and sophisticated firing analyses revealing complex orbitofrontal representations of task structure. These results will be of broad interest to psychologists, neuroscientists, and clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      - Elegant behavioral design that affords the detection of hidden-state representations.<br /> - Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      -The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments on revisions:

      The authors have thoroughly addressed all of my previous comments. Congratulations on an excellent paper!

    3. Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. The findings are important, and the results are compelling. The introduction and discussion contextualize the experiments in the context of the literature, and explain the novelty and significance of the current findings. Specifically, the observation that cocaine self-administration impairs generalization across task sequences at the single unit level builds on previous observations of aberrant neuronal activity within the OFC in animals with a history of cocaine self-administration. These new data point to a neurophysiological mechanism that could explain why drug-seeking is so context dependent, and hard to ameliorate with therapeutic strategies that take place within a clinical setting.

      The authors clearly acknowledge the major limitations of this work, namely that the sample size is restricted due to the technical challenges of performing in vivo electrophysiology recordings combined with self-administration, and that animals of only one sex were used. Importantly, the data from all rats within each group was remarkably homogeneous, increasing confidence in the conclusions drawn.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. eLife Assessment

      This valuable study re-evaluates a published simulation model on the role of heterozygote advantage in shaping MHC diversity. By modifying key modeling assumptions, the author argues that the original conclusions depend on a narrow and potentially unrealistic parameter range. While the work is in principle solid, the robustness of this claim is viewed differently by the reviewers. The manuscript further proposes an alternative modeling framework in which expansion of the MHC gene family allows homozygotes to outperform heterozygotes, thereby challenging the idea that heterozygote advantage alone can account for high allelic diversity at MHC loci. The topic is highly relevant for eco-immunology and evolutionary genetics, although a clearer delineation of the model's scope would help readers assess its broader implications.

    2. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published by Mattias Siljestam and Claus Rueffler conclusion (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand an even very slight change in ecological parameters. Second, a modified model that allows an expansion of MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to the readership of eLife if the conclusions are valid and non-trivial.

      The resubmitted manuscript addresses several questions from my previous review. In particular, there is a more detailed description of how the code of Siljestam and Rueffler ([SR]) was used for the simulations and the calculation of the factor 2.7 x 10^43 that is the key to the alleged breakdown of the numerical reasoning presented by in [SR].

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered. I guess the discussion becomes rather general about the universality and robustness of various types of models to parameter changes. My point is that none of the models is totally universal. The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response. The choice of constants and functions used in Eqs. (1-5) is dictated by the mathematical convenience and works in a limited range of parameter values. It is shown in [SR] that for 3 pathogens and reasonable "virulence " \nu, the alleles branch. These conclusions are supported by the analytically derived Adaptive Dynamics branching criteria (7), which, contrary to the statement is the cover letter (" It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.") is perfectly confirmed by the simulation data shown in Fig. 4.

      The mathematical simplicity of the [SR] model generates various artifacts, such as the mentioned by the Author reduction of the "condition" by an enormous factor 2.7 x 10^43 and the resulting decrease in the "survival" induced by the addition of a new pathogen. This occurs at the very large value of \nu=20, whose effect is enormous due to the Gaussian form of (1), which, once again, was chosen for the mathematical convenience. In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor c_max was introduced to buffer such an excess. There is no reason to fix c_max once for an arbitrary number of pathogens, because varying c_max basically reflects the observation that a well-adapted individual must have a reasonable survival probability. At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one, so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      I have doubts that the reported breakdown of the [SR] model with fixed c_max remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the population genetic underpinnings of the extraordinary diversity of genes in the MHC, which is widespread among jawed vertebrates. This topic has been widely discussed and studied, and several hypotheses have been suggested to explain this diversity. One of them is based on the idea that heterozygote genotypes have an advantage over homozygotes. While this hypothesis lost early on support, a reason study claimed that there is good support for this idea. The current study highlights an important aspect that allows us to see results presented in the earlier published paper in a different light, changing strongly the conclusions of the earlier study, i.e., there is no support for a heterozygote advantage. This is a very important contribution to the field. Furthermore, this new study presents an alternative hypothesis to explain the maintenance of MHC diversity, which is based on the idea that gene duplications can create diversity without heterozygosity being important. This is an interesting idea, but not entirely new.

      Strength:

      (1) A careful re-evaluation of a published model, questioning a major assumption made by a previous study.

      (2) A convincing reanalysis of a model that, in the light of the re-analysis-loses all support.

      (3) A convincing suggestion for an alternative hypothesis.

      Weakness:

      (1) The title of the study is catchy, but it is explained only in the very end of the paper.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered.

      I believe that I have fully addressed the points in the earlier review. The reviewer had doubted that my results were correct, attributing them to “a poor setup of the model” on my part. The reviewer stated that if I were correct about the factor of >10<sup>43</sup> change in cmax, this would “naturally break down all the estimates and conclusions made in Siljestam and Rueffler” (S&R).

      It appears that the reviewer is now convinced that my results represent a faithful analysis of the models on which S&R based their claims. The reviewer now contends that these results, including the factor of >10<sup>43</sup>, present no difficulties for the claims of S&R after all. In fact, this enormous factor of >10<sup>43</sup> is now claimed to support the conclusions of S&R by invalidating my conclusions. I respond to these new and very different arguments in what follows.

      As I stated in the first round of review, the issue is not the enormity of this factor per se, but the fact that the compensatory adjustment of cmax conceals the true effects of changes in other parameters. These effects are large; small changes to the parameter values mostly eliminate the diversity that the model is claimed to explain.

      The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response.

      The hidden sensitivity of the results of S&R to paramater values is sufficient to invalidate them as a proof of principle. The manuscript goes further and explains how the problem "is not specific to the details of the models of Siljestam and Rueffler, but is inherent in the phenomenon invoked to allow high diversity" because "any change that affects condition by as much as the difference between MHC heterozygotes and homozygotes will eliminate high equilibrium diversity". This general principle addresses all of the reviewer's points.

      In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor cmax was introduced to buffer such an excess. There is no reason to fix cmax once for an arbitrary number of pathogens, because varying cmax basically reflects the observation that a well-adapted individual must have a reasonable survival probability.

      This is not a legitimate reason for making compensatory, diversity-promoting adjustments to cmax when evaluating sensitivity to other parameters. If the number of pathogens or their virulence changes, cmax obviously does not automatically change along with it. If the population or species consequently goes extinct, then it goes extinct. If it persists, it does so with the same value of cmax.

      The possibility of extinction arguably puts a minimum value on cmax, but it does not restrict it to a range of values that conveniently leads to high MHC diversity. In the examples that I analyzed, slightly decreasing the number of pathogens or their virulence, which increases survivability, eliminates diversity. This phenomenon obviously cannot be dismissed on the grounds that survivability would be too low for the species to exist.

      S&R in effect assume that the condition of the most fit homozygote remains fixed, regardless of the number of pathogens, their virulence, and myriad other differences between species. It is this assumption that is without justification.

      At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one

      I am not sure what is meant by “the numerical simulation may break down”. Numerical error is not a tenable explanation of the lack of diversity observed in that simulation. The outcome is exactly what is expected from purely theoretical considerations: conditions of all genotypes fall on the steep part of the curve, making the mechanism proposed by S&R largely inoperative, so a pair of alleles forming a fit heterozygote comes to predominate. The numerical simulation is actually superfluous.

      Low survival rates are completely irrelevant to the effect of decreasing the number of pathogens or their virulence, which does not lower survival rates, but does eliminate diversity.

      so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      Whether or not it surprising, the lack of diversity is a problem for the claims of S&R, as there is no reason to expect the number of pathogens to have just the right value to produce high diversity. Furthermore, for many combinations of values of the other parameters (e.g., my v=19.5 and 20.5 examples), no number of pathogens leads to high diversity.

      Again, the general principle mentioned above makes the details that the reviewer refers to irrelevant. Nonetheless, some additional remarks are in order:

      (1) This comment ignores the fact that removal of a pathogen, or a slight decrease in “virulence”, eliminates diversity without lowering survival rates.

      (2) Small increases or decreases in v (virulence) eliminate diversity without having such large effects on condition.

      (3) In the example emphasized by the reviewer, mean survival rates are nowhere near as low as 10<sup>-43</sup>. Only homozygotes have such low fitness.

      (4) The adaptive dynamics predict the low diversity seen in the simulations, contrary to what the reviewer seems to suggest. Elimination of diversity is not an artifact of the simulation.

      (5) v\=20 was chosen because it is most favorable to the model of S&R in that it yields the highest diversity. Indeed, S&R only observed realistically high diversity with the narrow gaussians that the reviewer objects to. With lower values of v, diversity is much lower, but even this meager diversity is eliminated by small changes in parameter values (see below). If narrow gaussians and large effects of pathogens somehow invalidate results, then they invalidate the high-diversity results of S&R.

      I have doubts that the reported breakdown of the [SR] model with fixed cmax remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      These doubts are unwarrented. With the suggested parameter values, for example, increasing or decreasing m by 1 reduces the effective number of alleles to around 1 or 2. This can easily be checked using the simulation code of S&R, as detailed in my initial response and now in a Supplementary Text. Even without this result, the general principle mentioned above tells us that considering other regions of parameter space cannot rescue the conclusions of S&R.

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

      What is unsubstantiated is the claim of S&R that “For a large part of the parameter space, more than 100 and up to over 200 alleles can emerge and coexist”. As my manuscript illustrates, this is an illusion created by the adjustment of one parameter to compensate for changes in others.

      The reviewer even acknowledges that “the choice of constants and functions...works in a limited range of parameter values”. Furthermore, the manuscript explains why this problem is inherent to the general phenomenon, not specific to the details of the model or parameter values.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth. Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number. It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler. I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c</sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct. The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values. A simple way to determine this number is to have the simulation code print the value to which c</sub>max</sub> is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values. I have in this way confirmed this factor using the simulation code written and used by Siljestam and Rueffler. A procedure for doing so is described in the new Supplementary Text S1. In addition, I now give a theoretical derivation of this factor in Supplementary Text S2.

      This begs the conclusion that the branching remains robust to changes in cmax that span 4 decades as well.

      That shows at most that the results are not extremely sensitive to c</sub>max</sub> or K. They are, nonetheless, exquisitely sensitive to m and v. This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c</sub>max</sub>. It is evident from Fig. 4 of Siljestam and Rueffler that the level of diversity is not robust to these very large changes in c</sub>max</sub>, which include, as noted above, a change of over 43 orders of magnitude.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v\=20. As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v. This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions. Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      ...the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable. I have addressed the reasons for this suggestion above. Furthermore, I have confirmed the main conclusion—the extreme sensitivity of the results of Siljestam and Rueffler to parameter values--using the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”. I now describe, in Supplementary Text S1, how anybody can verify my conclusions in this way.

      Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem. However, as I understand it, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c</sub>max</sub>. Rather, they describe the adjustment of c</sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”. Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>). In this sense there is no loss of generality, but the automatic adjustment of c</sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I have expanded the end of the Discussion in the hope of clarifying the point expressed by the title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest to the author that they provide essential details about their simulations that would justify their claims, and to communicate with Mattias Siljestam and Claus Rueffler whether claims of the lack of robustness could be confirmed.

      The models simulated were modified versions of those of Siljestam and Rueffler. Thus, only the modifications were described in my manuscript. I have added a more detailed description of how c</sub>max</sub> was set in the simulations concerned with sensitivity to parameter values. In addition, the new Supplementary Text S1, which describes confirmation of the lack of robustness using the code of Siljestam and Rueffler, should remove any doubt about this conclusion.

      Reviewer #2 (Recommendations for the authors):

      I have no further recommendations. The manuscript is well written and clear.

      Thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Since this is a full report and not just a letter to the editor, it would benefit from a bit more introduction of what the MHC actually is and what the current understanding of its evolution is. Currently, it assumes a lot of knowledge about these genes that might not be available to every reader of eLife.

      I have added some more information to the opening paragraph. I would also note that this report was submitted as a “Research Advance”, which may only need “minimal introductory material”.

      (2) Some more recent literature on MHC evolution should be added, e.g., the review by Radwan et al. 2020 TiG, a concrete case of MHC heterozygote advantage by Arora et al. 2020 MolBiolEvol, and a simulation of MHC CNV evolution by Bentkowski et al. 2019 PLOSCompBiol.

      I have cited some additional literature.

      (3) Since much of the criticism hinges on the cmax parameter, its biological meaning or role (or the lack thereof) could be discussed more.

      I am not sure what I can add to what is in the first paragraph of the Discussion.

      (4) I find it difficult to grasp how the v parameter, which is intended to define pathogen virulence, if I understand it correctly, can be used to amend the breadth of peptide presentation. Maybe this could be illustrated better.

      I have attempted to make this clearer. The parameter v actually controls the breadth of peptide detection conferred by an allele, which, if not identical to the breath of presentation, is certainly affected by it. The basis of the “virulence” interpretation seems to be that narrower detection breadth can, according to the model, only decrease peptide detection probability, which increases the damage done by pathogens.

      (5) Please check sentences in lines 279ff on peptide detection and cost of . There seem to be words missing.

      There was an extraneous word, which I have removed. Thank you for pointing this out.

    1. eLife Assessment

      This valuable study provides evidence that locus coeruleus activity is coordinated with heart rate during sleep, confirming previous work in mice and humans, with a possible role for sleep-dependent memory consolidation. The claims are supported by solid evidence, although the underlying mechanisms and the predictive value of the correlative dataset would benefit from additional controls. This work will be of interest to neuroscientists focusing on sleep, memory, and autonomic functions.

    2. Reviewer #1 (Public review):

      Summary:

      This study examined whether infraslow fluctuations in noradrenaline and in heart rate are coupled and how they are affected by sleep transitions. The authors used the fluorescent NA biosensor GRAB-NE2m in the medial prefrontal cortex of mice to record extracellular NA while also recording EEG and EMG during sleep-wake episodes. They also analyzed previously published human data to reproduce relationships they found between sigma power and RR intervals in mice.

      Strengths:

      This is an impressive study with significant strengths, as it involves a rich set of data that includes not only observations of associations between heart rate and noradrenergic dynamics but also optogenetic manipulation of the locus coeruleus. Human data is presented to show parallels in the association between sigma power during sleep and phasic heart-rate bursts.

      Weaknesses:

      (1) Language could be clearer and more precise. As detailed below, in both the introduction and the discussion, the way the hypotheses and study objectives are described could use some revision to be more precise and accurate.

      1A) In the introduction on p. 4: The overarching question is framed as "could the peripheral autonomous systems be a read-out of the central LC-NE system and thus be a biomarker of memory consolidation and LC dysfunction?" This gives the impression that the LC function would be the main influence on peripheral autonomous systems. There are, of course, many influences on peripheral autonomous systems, so it would be advisable for the authors to be more specific here about what signal(s) in particular would be predicted to be sensitive markers of LC function.

      1B) In the discussion on p. 12: "In this study, we leveraged real-time measurements of mPFC NE levels and HR measurements from EMG recordings in mice to investigate the causal link between the two variables with high temporal resolution in freely moving sleeping mice, with similar inspection in humans." To test the causal link between mPFC NA levels and HR measures, the study would manipulate NA levels just in the mPFC and not elsewhere in the brain. However, in this study, the manipulation occurred in the LC, and so there would be broad cortical changes in NA levels. Thus, it could be that LC activity causes HR changes via a non-PFC pathway.

      (2) Comparisons with the control condition need further development.

      2A) While the authors did include a key YFP control condition, in the main text no direct statistical comparison between the closed-loop optogenetic stimulation (ChR2) condition and the YFP control condition was reported. (It was reported in Supplementary Figure 2c-d.) Instead, in the main text, the authors only reported that the effects of stimulation were significant in the closed-loop condition and not in the control. However, that is not the same as demonstrating that the two conditions significantly differed from each other, and it is the direct test that is important for the conclusions, so it seems important to include this result in the main presentation.

      2B) In addition, the authors should address the issue that the pre-stimulation NE was consistently significantly lower in the YFP condition than in the ChR2 condition (see Supplementary Figure 2c), which is a potential confound.

      2C) Direct comparison of the strengths of correlations shown in Figure 2h vs. Supplementary Figure 2f should be included. Currently, we see relatively weak correlations in both ChR2 and YFP conditions, and it is not clear if the relationships differ in the control. It seems they are still present in the control condition but weaker, which would contradict the apparently broad claim on p. 7 that "No such effects were present in the control condition" (it is not entirely clear whether this claim refers to all effects discussed in the figure or just a subset - this language should be clarified).

      2D) Did the YFP controls vs. ChR2 animals show any differences in the number of NA states that triggered stimulation in the closed-loop system? With ChR2 animals, stimulation changes NA, which could change future triggering. In YFP animals, nothing changes NA (other than natural fluctuations), so the dynamics of stimulation timing could diverge between groups in a way that complicates interpretation. Specifically, if ChR2 stimulation raises NA and prevents future threshold crossings, ChR2 animals may end up receiving fewer subsequent stimulations than YFP animals (or a different temporal clustering). If the number or pattern of stimulation differed in two groups, it would be important to have a yoked control where matched animals get the same stimulation pattern but not triggered by their own NA.

      (3) Some more discussion/explanation of the rationale for the closed-loop approach and how it influences how we should interpret the results could be useful. For instance, currently, it is not clear whether LC stimulation needs to be timed after an NA dip to yield the effects seen.

      (4) The section on heart rate decelerations is hard to follow. In particular, I was not sure how to interpret Figure 3f-j. For Figure 3f, what does the middle line represent? The laser onset or the max RR value after laser onset? What is the baseline that is used to correct the values to obtain amplitudes? If it is the whole period before the maximal RR value or the laser onset, wouldn't baseline values differ significantly across conditions and so potentially account for differences seen between conditions in the reported HR decelerations? Larger HR decelerations may be seen in conditions with higher HR simply as a regression to the mean phenomenon.

      (5) The findings regarding LC suppression could be further clarified.

      5A) Page 8: "observed a response in NE decline" - please be more precise. Did NE decline more or less?

      5B) It would be helpful to also show the correlation between NE and RR in the control (YFP) condition and whether there were any differences between YFP and Arch conditions (Figure 4e).

      5C) This sentence took me multiple readings to understand - it would be helpful to rewrite to make it clearer: "indicating that, while HR generally did not respond strongly to LC suppression, the variability in RR responses was dependent on NE changes to the suppression (Figure 4e)."

      5D) The two colors in Figure 4 are similar and hard to distinguish.

      5E) The correlations shown in Figure 4j seem to be driven by just two of the cases. Are the effects significant when outliers are removed?

      5D) Page 10: Were there any differences in memory performance between the Arch and YFP conditions?

      5E) Page 10: "We found a correlation between RR responses to LC suppression and sigma power, suggesting that a stronger HR reduction response is linked to higher spindle power." It should be noted in the text that the correlation was not specific to sigma (it was also seen for theta and beta, Figure 4i).

      (6) It is not clear which of the sigma power and RR interval findings do/do not exactly line up between the mice and humans. It could be helpful to have a table comparing them. For instance, was the finding in humans that pre-HRB sigma power was positively associated with slowing in heart rate after the HRB also seen in mice? Was there evidence in mice (as seen in the human sample) that sleep-dependent memory improvement was associated with pre-HRB sigma power?

      (7) Page 18: It is not clear if the sex of mice was balanced across controls and optogenetics groups.

    3. Reviewer #2 (Public review):

      Summary:

      The major part of this study reproduces previously published findings in both mice and humans and provides incremental analyses on these findings. In essence, the work reaffirms the presence of coordinated infraslow fluctuations in sigma power and heart rate during NREM sleep. It further confirms previous findings that coordination depends on noradrenaline-releasing neurons in the locus coeruleus. Also supporting previously published work in mice and humans, the authors describe a link between the strength of these infraslow fluctuations and memory consolidation in mice and humans.

      Strengths:

      The authors successfully replicate key previously reported phenomena across both mice and humans. Confirmatory studies and demonstrations of reproducibility are essential for progress in neuroscience. To maximize their value, such studies should clearly acknowledge their confirmatory nature and carefully situate what, in their view, are novel results, going beyond existing literature.

      Weaknesses:

      The authors' interpretation of their data needs to be revised. Many of their claims regarding the mechanistic basis of their findings and the predictive value of their correlative datasets are not supported by the available evidence.

      In the present manuscript, several citations of literature on the work they reproduce lack precision or completeness, which reduces transparency and obscures how the reported findings relate to previously established results.

    4. Author response:

      Response to reviewer 1:

      We thank reviewer 1 for their thoughtful, detailed, and constructive evaluation of our manuscript. We appreciate their recognition of the strengths of the study, particularly the integration of noradrenergic recordings, optogenetic manipulation, and cross-species analyses. We are especially grateful for the reviewer’s careful attention to clarity, experimental interpretation, and control comparisons. The comments have helped us sharpen the framing of our hypotheses, clarify causal claims, improve statistical reporting, and better explain our closed-loop approach and heart rate analyses. We have addressed each point in detail below and believe that the revisions substantially strengthen the manuscript.

      Response reviewer 2:

      We thank reviewer 2 for their thoughtful comment regarding citation, positioning relative to prior work, and caution in mechanistic interpretation. We have made efforts to cite relevant foundational and related work throughout the manuscript, but we will of course further clarify the relationship between our findings and prior studies in the revision.

      Although prior work has demonstrated infraslow coupling between sigma activity and heart rate and established a role for the locus coeruleus (LC) in coordinating these oscillations, cardiac measures have typically been presented as secondary observations rather than as primary experimental targets. While we of course recognize all the prior efforts conducted, a central goal of the present study was to perform a targeted and highly systematic characterization of norepinephrine-mediated heart-rate dynamics during sleep, integrating infraslow relationships, sleep-wake transitions, and a range of physiological manipulations of LC activity. A major priority of ours was to link infraslow heart-rate fluctuations to the well-known very-low-frequency (VLF) component of heart rate variability (HRV). Within the clinical HRV field, VLF has remained comparatively under-characterized and mechanistically unresolved. Our findings provide a biologically grounded explanation for this component, which we believe may be informative for the broader HRV community.

      Second, a core aim of this work is to provide a translational tool: to determine whether cardiac dynamics alone can reflect the infraslow, memory-consolidating potential of sleep and thus serve as a non-invasive biomarker. Because direct LC recordings are not feasible in humans, HRV, including its VLF component, may offer a clinically accessible proxy of sleep’s memory-restorative capacity. By directly manipulating LC activity and demonstrating corresponding changes in heart-rate dynamics, we strengthen the mechanistic and translational rationale of this biomarker approach. Our findings suggest that heart-rate measures alone may provide an estimate of the infraslow memory-consolidating potential of sleep.<br /> In revision, we will ensure that the foundational findings underlying this manuscript are highlighted, while communicating our new findings more clearly.

    1. eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

    2. Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      (2) For some methods, more detailed information is needed.

      (3) There are grammar issues in the text that need to be fixed.

      (2) Some text in the figures is not labeled well.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

    4. Author response:

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We will carefully consider each comment from these two reviewers and will revise the manuscript accordingly. Below, we provide a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles will be crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We will expand these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we will carefully check the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We will add labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we will supplement the additional immunological experiments or validation using in vivo models. In addition, we will elaborate on the limitations of our study in the Discussion section and provide reasonable explanations.

    1. eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterisation. Thus, the strength of the evidence can be enhanced by the use of a mouse model. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

    2. Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be be improved in a revision.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

    4. Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serineprotease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

    5. Author response:

      eLife Assessment

      The findings of this study are important since they cover the repurposing of small molecules as snake venom metalloprotease and phospholipase inhibitors for early intervention in the treatment of bothropic envenoming in the Neotropics, and thus provide a strong rationale for the progression of these inhibitors into future preclinical and clinical evaluation for snakebite indications across various ecological zones. The strength of the evidence is solid; however, there are some weaknesses, such as a lack of translatability of the in vivo model and insufficient venom characterisation. Thus, the strength of the evidence can be enhanced by the use of a mouse model. The paper remains of interest to ophiologists, biochemists and medicinal chemists.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the Neotropics.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address. Our work in this manuscript included the standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript, we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated (Line 434 to 448) that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a ‘definitive demonstration of a broadly effective, deployable intervention’, we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revision.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

    1. eLife Assessment

      This useful study supplements previous publications of willed attention by addressing a frontoparietal network that supports internal goal generation. The evidence is solid in analyzing two datasets collected at different independent sites, using the same willed-attention paradigm and combining fMRI and EEG. This work will interest cognitive psychologists and neuroscience researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses a fundamental question in cognitive neuroscience regarding how the brain transitions from a reactive state of following external instructions to a proactive state of self-directed agency. The authors utilize an ambitious multimodal design by combining the spatial precision of fMRI with the temporal resolution of EEG across two independent datasets from the University of Florida and UC Davis. By applying multivariate pattern analysis, the work demonstrates that while both instructed and willed attention engage the Dorsal Attention Network, willed choices uniquely recruit a frontoparietal decision network including the dACC and anterior insula. Furthermore, the study shows that pre cue alpha oscillations can predict subsequent spontaneous choices. This provides a neural link between pre-existing brain states and intentional action, representing a significant technical effort to characterize the neural scaffolding of internal goal generation.

      Strengths:

      The primary strengths of this work include the integration of fMRI and EEG which allows the authors to bridge the gap between slow metabolic signals and fast oscillatory brain states. The use of two independent cohorts is a commendable effort to ensure the reproducibility of the willed attention effect, which is often a concern in small sample neuroimaging studies. Additionally, the move beyond univariate activation toward information based mapping demonstrates that the identified networks actually contain specific information about the direction of attention.

      Weaknesses:

      However, several critical weaknesses must be addressed to support the fundamental claims made in the manuscript. There are significant behavioral differences in performance between the two sites, specifically regarding the UC Davis cohort exhibiting slower reaction times and lower accuracy compared to the UF group. These discrepancies suggest potential differences in subject populations or experimental environments that are not currently accounted for in the neural models. The fMRI analysis lacks temporal precision because the use of beta series regression collapses the complex BOLD response into a single estimate per trial. This loss of temporal information obscures the evolution of the decision process and makes it difficult to distinguish whether the identified patterns represent a truly spontaneous choice or a slow building pre planned strategy.

      Furthermore, the EEG decoding approach utilized the entire topography of electrodes rather than a biologically motivated posterior region of interest. Given that alpha mediated spatial attention is traditionally localized to parieto occipital sensors, using the full electrode set risks the inclusion of non neural artifacts such as micro saccades or muscle activity which can contaminate multivariate classifiers. The introduction of the neural efficiency metric also requires further validation as the current ratio is mathematically sensitive to small denominators in the BOLD contrast.

      Crucially, the manuscript does not address the physiological implications of recruiting additional frontoparietal networks when behavioral performance remains identical across conditions. The activation of the anterior insula and dACC is frequently associated with increased autonomic arousal and effort. If the willed condition requires more extensive neural scaffolding to reach the same behavioral output as the instructed condition, it raises the question of whether this internal decision process is accompanied by changes in arousal levels. The authors should consider whether the lack of a behavioral tax is due to a compensatory increase in arousal, which could be reflected in the EEG data or pupil diameter if recorded, and potentially also in the amplitude of BOLD activity, which is being masked by the neural efficiency metric. Without an account of how the brain balances this increased computational demand without impacting behavioral performance, the functional significance of the willed attention network remains partially obscured.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript combines fMRI and EEG investigations performed at two research sites to examine 'willed' or volitional visuospatial attention, as contrasted with more standard cued (or 'instructed') visuospatial attention. The primary findings are: 1) willed attention (vs. instructed attention) drives additional cortical circuitry across a broad fronto-parietal network; 2) the direction of willed attention, but not instructed attention can be decoded from the pre-cue EEG data and from MVPA analysis of the trial-level fMRI data; and 3) the subjects with high EEG decoding also exhibited high neural efficiency (i.e., high decoding with low BOLD signal change) in the fMRI data. The methods and data analysis are generally sound, and these results appear solid. On the negative side, it is not made clear how the present findings extend our understanding beyond prior published work from one of the senior authors. There are also three significant concerns regarding interpretation of the findings. One has to do with the causal interpretation of the pre-cue alpha EEG signal determining the direction of willed attention. The second concern is the degree to which the present research paradigm adequately examines 'willed attention.' The third is that the MVPA analysis is not sufficiently described, and Permutation testing needs to be done to validate these findings. Otherwise, this manuscript appears methodologically sound, but questions about interpretation may mute the potential impact.

      Strengths:

      The focus on willed attention attempts to move beyond some of the many limitations of standard laboratory investigations of attention.

      The shared paradigm across two modalities and two research sites demonstrates solid reproducibility, even though a few minor differences are observed across sites.

      Weaknesses:

      (1) There are concerns about this experimental paradigm carrying the banner of Willed Attention, because the application of 'Will' appears quite modest. Yes, extra brain activity is exhibited for this condition vs. its control, but do the cognitive processes isolated adequately stand in for 'Willed Attention?" Willed attention, as operationally defined here, appears to involve a simple decision process prior to the shifting of spatial attention. The cue is internally generated, but after that the rest of the attentional processes appear identical to standard externally cued visuospatial attention experiments. This self-generated cue process likely involves some sort of memory/history of the recently selected cues and then some random-ish selection between A and B. This appears very similar to asking the subject to guess whether a fair coin flip will be heads or tails on each trial. A mental 'coin flip' feels like a very weak version of 'will.' As a potential remedy, it would be helpful to discuss what other phenomena might fall within 'willed attention' and what some future studies might choose to focus on, along with some potential pitfalls (e.g., the reasons why the current study avoided more robust exemplars of will).

      (2) The manuscript is lacking a description of the decision processes used during the willed attention paradigm and is lacking evidence as to WHEN subjects made their willed decision. Both of these points are of major concern:

      (a) The authors state: "For willed attention, participants were explicitly told to avoid relying on any stereotypical strategies of generating decisions, such as always attending the same/opposite side they attended during the previous trial, as well as to avoid randomizing or equalizing their decisions to choose left or right across trials; prior studies found that decisions to explicitly randomize decisions might invoke additional working memory related processes (Spence & Frith, 1999)." Subjects were instructed NOT to apply a simple heuristic and NOT to randomize or try to equalize their decisions, but exactly HOW the subjects made their decisions is not at all clear. What options does that leave? How does this strategy avoid the working memory-related processes mentioned in the Spence & Frith, 1999 citation? The brain regions that comprise the network of interest (aka Frontoparietal Decision Network) are activated by a very broad range of visual cognitive tasks, including many working memory paradigms. The Anterior Insula and dACC nodes Salience Network often simply reflect task difficulty. Obviously, making a choice is more cognitively demanding than not making a choice. The present experiments do not distinguish functional roles between different regions of the Frontoparietal Decision Network. On the whole, the study does very little to isolate the cognitive processes or neural bases of willed attention beyond calling out the set of 'Usual Suspects' for visual cognition.

      (b) The finding that pre-cue EEG signals predicted the postcue decision is intriguing. It could mean that the seemingly irrelevant and transient state of the brain causally and unconsciously biased the subject to one direction or the other. Alternatively, it could mean that the subjects utilized the pre-cue period to make their decision and hold it in case it was needed (i.e., that it was a choice trial). While 2-8 seconds ITI variability makes sense for fMRI decoding, it is a long time for a subject to idly wait, so they might fill that time preparing for the next trial. There appears to have been a substantial amount of individual difference in the pre-cue alpha decoding, which could reflect individual differences in cognitive strategy, specifically in the use of the pre-cue period to make their decision. More efficient decision makers might have pre-decided, which might account for the neural efficiency. The experiments lack any measurement of WHEN participants made their decision. For that reason, I would ask that the authors temper their claims about the significance of the alpha decoding and its possible causality.

      (3) Did individual subjects exhibit a choice bias of location for the willed trials? If not, doesn't that raise concerns that subjects were trying to equalize their trials? If they do exhibit location biases, how does that impact the decoding? A simple decoder could learn to always just guess the biased direction for a subject and would perform > 50%. Consider the example in which an individual subject chooses 'Left' 55% of the time. A classifier that simply learns to choose 'Left' on every trial will be correct on 55% of trials. The training data would likely be sufficient to learn the direction of choice bias in each individual subject. So the classifiers could perform significantly above 50% without learning anything beyond the tendency of each subject. That is to say, 50% is not truly chance in this data set. It doesn't appear that Permutation testing has been performed to empirically determine chance for an individual's data. Permutation methods, scrambling the labels 1000 or 10000 times to establish a true baseline would be preferred over simply comparing to 50% and would address concerns about individual subject biases.

      (4) The novel contributions of this work beyond the two prior Bengson et al papers from Dr. Mangun's lab appear quite modest. The discussion would be enhanced by specifically stating how the present work advances understanding beyond the prior Bengson studies.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript analyzes two independent datasets collected at different sites. Using the same willed-attention paradigm (instructional vs. choice cues) and combining fMRI and EEG analyses, the authors investigate how attentional direction is selected when no external instruction is provided. Their main claims are that the dorsal attention network is engaged by both cue types, whereas the choice cue additionally involves a frontoparietal decision network. Moreover, left-versus-right attentional decisions can be decoded in this decision network only on choice trials, and multichannel pre-stimulus alpha patterns predict the subsequent attentional choice. Finally, individuals with more predictive alpha patterns show greater neural efficiency in the decision network, i.e., higher decoding with lower BOLD activation.

      The question is worthwhile and the two-site design is a genuine strength. At the same time, several central inferences rely on decoding analyses for which the statistical testing and cross-validation structure are not described in enough detail to assess robustness. In addition, using a ratio-based neural-efficiency measure make the interpretation more fragile than it needs to be. With a focused revision that tightens inference around MVPA and clarifies a few methodological points, I think the paper could become substantially more convincing.

      Strengths:

      The work extends previous willed attention studies by attempting to link pre-stimulus alpha pattern predictability to post-cue frontoparietal representations, and by testing reproducibility across two datasets. The conceptual advance beyond previous studies, e.g., Bengson et al. (2015), however, depends on how solid the decoding-based evidence is and whether alternative explanations are convincingly excluded. At present, the strength of support is limited mainly by incomplete reporting and/or controls for MVPA significance testing, as well as potential inflation of decoding estimates if folds are not independent of run structure. Concerns about statistical assessment of decoding accuracy are well documented in the literature (Combrisson & Jerbi, 2015).

      Weaknesses:

      (1) The manuscript describes the decoding pipeline for both fMRI and EEG MVPA. However, it does not clearly specify how "significantly above chance" is determined for the fMRI ROI decoding, nor how multiple comparisons across ROIs are handled, even though p-values are reported. The same issue applies to the time-resolved EEG analysis across many time points. For each decoding analysis, please specify the inferential test (e.g., permutation test within participant, group-level test on subject accuracies, binomial test, etc.) and report effect sizes with confidence intervals (e.g., Combrisson & Jerbi, 2015). Further, for EEG decoding over time, it would be preferable to control family-wise error, e.g., cluster-based permutation, rather than thresholding pointwise p-values. A standard approach here is the nonparametric cluster framework (e.g., Maris & Oostenveld, 2007).

      (2) The cross-validation approach used here is appreciated and appropriate in principle. However, random 10-fold splits across trials can inflate accuracy if training and test folds share run-specific noise, scanner drift, or autocorrelated structure. The manuscript should indicate whether folds were blocked by run or randomized across the entire session. In addition, please report the number of trials per condition after artifact rejection and after removing short ITIs for the long prestimulus epochs (−2500 ms to 0 ms) for each dataset in the section of EEG preprocessing. Similarly, please report how often participants chose left vs. right on choice trials, and whether balanced folds (or an equivalent balancing procedure) were used if needed.

      (3) Moreover, ROI definition is not sufficiently specified and independence should be clarified. The ROIs are defined based on peaks from the choice-instructed univariate contrast (Table 2) and then used for MVPA. First, are these ROIs defined as spheres around peaks or using anatomical masks? What radius or voxel count was used? This needs to be explicit. Second, I am concerned about circularity risk. Although choice-vs-instructed selection is not identical to left-vs-right decoding, ROI selection from the same dataset can still bias descriptive estimates and encourages overinterpretation if not carefully justified (Kriegeskorte et al., 2009). At minimum, the authors should explain why their selection criterion is independent of the decoded contrast under the null, and ideally provide a robustness check using either anatomical ROIs or independently defined ROIs, e.g., from prior literature or an atlas.

      (4) Using an index of neural efficiency is conceptually interesting. However, if the denominator, computed as the activation difference between choice and instructional conditions, is near zero or noisy, the ratio can become unstable. I would rather see a multivariate model that treats activation and decoding as separate dependent measures, or a latent-variable approach, than a single ratio.

    1. eLife Assessment

      This study presents a valuable examination of two measurements of physical activity (self-report and objective) in relation to widely studied structural MRI measures of the brain (hippocampal volume and BrainAGE) and cognitive function (Trail Making Test). Cross-sectional and longitudinal data were analyzed using established and validated methodology. The results convincingly suggest that brain health is more likely a cause of physical activity than an outcome of it, although limitations to the data could mask evidence of benefits to brain health. This work will be of interest to neurologists and epidemiologists studying the etiology of cognitive decline, to clinicians interested in advising patients on strategies for preserving brain health in aging, and to members of the lay public.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the relationship between physical activity (PA) and both structural (MRI) and cognitive brain health in the LIFE-Adult Study, with total baseline recruitment of 2576. Hippocampal volume, an MRI-derived BrainAGE marker, and scores from the Trail Making Test were used as outcomes, with the majority of participants measured at baseline and subsets also measured in a follow-up session. The key findings were a lack of direct association between PA and outcomes, but longitudinal evidence for a higher BrainAge at baseline leading to lower physical capacity at follow-up. This supports a reverse-causation hypothesis in contrast to the prevailing understanding of the positive effects of physical activity on brain health.

      Strengths:

      The Life-Adult study is a rich and carefully acquired dataset, with multiple follow-up time points. The statistical analyses were conducted carefully with appropriate control for confounds and multiple testing. The study design enables an important assessment for reverse causality. The authors are scrupulous in their consideration of a number of factors that could potentially bias their results, performing an age-stratified analysis, and emphasising discrepancies in PA measurements (specifically, age-reporting bias) across the dataset and other limitations.

      Weaknesses:

      This is an observational study with inconsistent measures of physical activity. Previous studies have used physical activity interventions, and might be more strongly weighted when considering evidence for these effects (specific confounders involved in interventions notwithstanding).

      The model identifying potential reverse causality is relatively limited - it seems possible/likely that brainAge could reflect more general health status, which would expand the potential range of factors underlying this observation.

      The important quantitative actigraphy subset is small (n=227), as are the longitudinal subsets. Along with the discrepancy of physical activity/capacity at baseline and follow-up, and other complexities of the dataset, it is difficult to make firm conclusions. The authors point out that the actigraphy subset was quite inactive.

    3. Reviewer #2 (Public review):

      Summary:

      This population-based cohort study found no evidence that physical activity, whether self-reported or objectively measured, positively influenced brain structure (hippocampal volume or BrainAGE) or cognitive function (Trail Making Test scores). Notably, longitudinal analyses suggested the opposite temporal relationship: a higher BrainAGE at baseline predicted higher physical capacity at follow-up, more in line with reverse causation rather than a neuroprotective effect of physical activity.

      Strengths:

      The study's statistical approach is thorough and well-documented, and the inclusion of two measurements of physical activity (self-report questionnaire and objective accelerometer data) is a strength. The longitudinal aspect also represents a strength.

      Weaknesses:

      Several aspects of the measurement timing warrant consideration. Physical activity was assessed over 7-day periods, creating a potential mismatch with (commonly less dynamic) brain outcomes examined (hippocampal volume, BrainAGE), which may reflect cumulative exposures over longer timescales. Additionally, the asynchronous measurement protocol (cognitive testing preceding accelerometry, and the MRI occurring weeks after baseline visits) may introduce time lags that attenuate associations. The observed null associations may be influenced by timing misalignment rather than reflecting the absence of consistent effects of physical activity on brain health and cognition.

      Other measurement characteristics also warrant consideration when interpreting the null findings. Physical activity was assessed using short-form self-report questionnaires and averaged accelerometer MET/day values, both of which have limited reliability. Additionally, the modest accelerometer subsample size and low/insufficient variation in activity levels observed in this cohort increase the likelihood of missing effects. These factors collectively raise the possibility that true physical activity-brain health associations may have been obscured.

      The study's conclusions regarding brain health, structure, and cognitive functioning are broad despite the scope of the selection of outcomes examined. The analyses focus on hippocampal volume, BrainAGE (a global aging metric), and Trail Making Test performance (processing speed and executive function), while omitting other important neuroimaging markers such as cortical thickness, functional connectivity, or white matter microstructure. The null findings presented here cannot exclude positive effects of physical activity on broader constructs of brain health or cognitive functioning.

      While the authors appropriately note the use of different physical activity instruments across time points (IPAQ at baseline, VSAQ at follow-up) in the limitations section, the discussion should more explicitly address the interpretive challenges this creates. The observed association between higher baseline brain age gap and lower follow-up physical activity may reflect: (1) a true temporal relationship, (2) an artifact of switching from behavior-focused (IPAQ) to capacity-focused (VSAQ) measurement, or (3) some combination of both. This ambiguity substantially limits causal inference.

    1. eLife Assessment

      This study provides valuable contributions to establish canonical Dhh signaling as a primary mediator in the differentiation of Leydig cells and their steroidogenic capacity. Together, the experimental design using their established stem Leydig cell line alongside relevant genetically mutated models, both derived using the relevant Nile tilapia animal system, provided largely convincing evidence to support their conclusions. The work could benefit from a more rigorous dissection of current literature on this pathway that might better inform their conclusions. The work will be of broad interest to developmental biologists interested in differentiation of steroidogenic or hormone producing cells.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.'

      Strengths of the methods and results:

      - The use of Nile tilapia is important as it is an important aquaculture species, it shares the genetic pathway for sex determination of mammalian species, and molecular differentiation pathways are highly conserved<br /> - The approach is rigorous and incorporates a novel TSL, clonal stem Leydig cell model that they developed that is relatively faithful in following endogenous developmental steps and can produce the appropriate steroid.<br /> - Tilapia are relatively amenable to CRISPR/Cas9 targeting and, with their accelerated developmental time frame, provide an excellent model system to interrogate specific signaling pathways.<br /> - The stepwise analysis from dhh-gli-sf1 is thoughtful and well done.

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.<br /> - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.<br /> - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.<br /> - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      Achieved Aims:

      The authors set out to test the hypothesis that the canonical Dhh signaling pathway for Leydig cell differentiation and steroidogenic activity is mediated via ptch2 and gli1 regulation of sf1. The results are strong, there are additional steps needed to verify that redundancy/compensation is not contributing to the outcomes.

      This work is important in better understanding of nuanced commonalities and differences in developmental pathways across species. Specific to Leydig cell differentiation and steroidogenesis, their work with tilapia supports conservation of the canonical Dhh pathway; however, there appear to be some differences in downstream mediators compared to mouse. Specifically, they conclude that ptch2/gli1 stimulates sf1 and steroidogenesis in tilapia where gli1 is dispensable in mouse. Instead, Gli3 has recently been shown to play an important role to stimulate Sf1 and support the hedgehog pathway.

    3. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. eLife Assessment

      This paper presents a computational method to infer from data a key feature of affinity maturation: the relationship between the affinity of B-cell receptors and their fitness. The approach, which is based on a simple population dynamics model but inferred using AI-powered Simulation-Based Inference, is novel and valuable. It exploits recently published data on replay experiments of affinity maturation. While the method is well-argued and the validation solid, the potential impact of the study is hindered by its complex presentation, which makes it hard to assess its claims reliably.

    2. Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail. This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference. In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question.

      (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model.

      (3) Code and data are publicly available and well-documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

    4. eLife Assessment

      This paper presents a computational method to infer from data a key feature of affinity maturation: the relationship between the affinity of B-cell receptors and their fitness. The approach, which is based on a simple population dynamics model but inferred using AI-powered Simulation-Based Inference, is novel and valuable. It exploits recently published data on replay experiments of affinity maturation. The method is well argued and presented, and the validation is compelling.

    5. Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference.

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      The revised methods section is easier to follow and better explains the approach. Inference results on simulated data are compelling and the real-data findings are compared with alternative approaches, clarifying the relationship to previous work.

    6. Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      • The authors leverage the unique data they have generated for a separate project to provide novel insights to a fundamental question.
      • The paper is clearly written, with accessible methods and straightforward discussion of the limits of this model.
      • Code and data are publicly available and well-documented.

      Weaknesses:

      • No substantial weaknesses noted.
    7. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. eLife Assessment

      This valuable manuscript provides solid evidence regarding the role of alpha oscillations in sensory gain control. The authors use an attention-cuing task in an initial EEG study followed by a separate MEG replication study to demonstrate that whilst (occipital) alpha oscillations are increased when anticipating an auditory target, so is visual responsiveness as assessed with frequency tagging. The findings offer a re-interpretation of the inhibitory role of the alpha rhythm, supporting that alpha oscillations contribute to interareal communication.

    2. Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results imply that alpha modulation does not solely regulate 'gain control' in early visual areas (also referred to as alpha inhibition hypothesis), but rather orchestrates signal transmission to later stages of the processing stream.

      Comments on the first revision:

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated."

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)."

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      Comments on the latest version:

      I am satisfied with the author's response and do not have any additional comments.

    3. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. eLife assessment

      The ability to estimate the force of infection for Plasmodium falciparum from other more directly measurable epidemiological quantities is a useful contribution to malaria epidemiology. The authors propose a method to accomplish this using genetic data from the var genes of the Pf genome and novel applications of existing methods from queueing theory. While the simulations are sophisticated, the real-world application of the method is incomplete in its analysis and would benefit from clearer articulation of the assumptions being made. Given the lack of clarity in the methods and presentation of results, it is difficult to fully assess the performance of their proposed estimation procedure.

    2. Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context.

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics.

      (3) The mathematical approach is simple and elegant, and thus easy to understand.

      Weaknesses:

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates.

      (2 )Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals.

    4. Reviewer #3 (Public Review):

      Summary:

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI.

      Strengths:

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics.

      Weaknesses:

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone).

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. eLife Assessment

      The ability to estimate the force of infection for Plasmodium falciparum from other more directly measurable epidemiological quantities would contribute to malaria epidemiology. The authors propose a method to accomplish this using genetic data from the var genes of the Pf genome and novel applications of existing methods from queueing theory. After revising the manuscript, this is a useful contribution to the field and the authors provide solid evidence to support it.

    2. Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weaknesses:

      - The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      Comments on revisions:

      The authors have adequately responded to all comments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. eLife Assessment

      This study provides valuable insights through the elucidation of the first full-length structure of the heterohexameric (MmpS4)₃-(MmpL4)₃ transporter complex from Mycobacterium tuberculosis, advancing understanding of its transport mechanism, linked to virulence and drug resistance. The structural analysis is convincing, offering a clear framework for future mechanistic studies. Major strengths include a comprehensive structural characterization of the complex, though some conclusions require further validation.

    2. Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.

      Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)₃-(MmpL4)₃ complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

    5. Author response:

      We thank the three reviewers for their critical and in-depth assessment of our study. Below you find our comments to the public reviews and our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

      We thank reviewer#1 for this positive assessment of our work. The deviation of the AF prediction from the experimental structure is , in our view, not puzzling. AF does not take the physical properties of proteins into account, but predicts structures based on strong sequence alignments. It therefore does not have “knowledge” about the general flexibility of domains such as the CCD, which is also observed in the corresponding MmpL5 structures, nor does it have knowledge about preferred conformational states. Rather than “failing” to confirm the AF predictions, our cryo-EM structure revealed an unexpected tilted conformation of the CCD. As we outline in comments below, the physiological relevance of the tilted CCD is unclear. Its flexibility might be required to interact with (still elusive) outer membrane protein components to form the fully assembled efflux machinery.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      We thank reviewer#2 for this positive assessment of our work and agree that it is interesting that the experimental structures do not fully agree with the AF predictions (see also comment to reviewer#1).

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      We thank the reviewer for this valuable comment. We will add a new figure with the model of the MmpL4 transport cycle based on our new data and discuss the proposed molecular transport mechanism in more detail in the main.

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      In the revised version, we will include additional data to assess the functional consequences of cross-linking.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      We state clearly in the discussion that the channel through the CCD seems too narrow to let large molecules like mycobactin and bedaquiline pass:[AG1]

      Line 318ff: “ The channel radius of the MmpL4 CCD is very narrow with a minimum of 1.1 Å according to the AlphaFold3 predition (Fig. 5). This is much smaller than the smallest axis of a molecular model of mycobactin molecule of ?? nm as determined from a model of iron-free mycobactin. In addition, the cryo-EM structure of MSMEG_1382 revealed a constriction in the CCD channel [21]. Even though the methionine side chains lining the channel wall are considered to be flexible{Aledo, 2019 #69594}, large conformational changes of the α-helical hairpins relative to each other would be required to allow passage of molecules as large as mycobactin and bedaquiline. The AcrAB-TolC efflux machinery provides an example for such large conformational changes to enable transport of large molecules by iris-like opening and closing movements the outer membrane channel-tunnel TolC [33]. Similar helical twisting may widen the channel of the CCD. Alternatively, it is conceivable that the substrates of MmpL4/MmpL5 are transported along the CCD surface, potentially requiring further protein partners. It is interesting to note that siderophore secretion and drug efflux by MmpL4/MmpL5 systems involves at least two additional proteins, namely the periplasmic protein Rv0455, which was shown to be essential for mycobactin efflux [34] and an outer membrane channel, whose identity remains elusive. A complete molecular understanding of the transport mechanism through the MmpL4/MmpL5 systems hence requires the identification of the missing components and structural information about their interactions.”

      The channel radius of the MmpL4 CCD is very narrow (minimum of 1.1 Å) according to the AlphaFold3 prediction (Fig. 5), and the cryo-EM [AG2] [MN3] structure of MSMEG_1382 revealed a further constriction in the CCD channel [21]. We therefore consider direct substrate transport through the CCD central channel to be physically implausible for molecules of the size of mycobactin and bedaquiline. Even accounting for the flexibility of the methionine side chains lining the channel wall, the large conformational changes of the α-helical hairpins relative to each other would be required to accommodate such large substrates. While iris-like opening movements have been described for TolC in the AcrAB-TolC system [33], those movements widen an already substantially larger channel, and even such dramatic conformational changes would be insufficient to open a channel as narrow as that of the MmpL4 CCD to a diameter permissive for substrate passage. We instead favor a model in which substrates are transported along the outer surface of the CCD, potentially with the assistance of additional protein partners. This is consistent with the observation that MmpL4/MmpL5-mediated siderophore secretion and drug efflux involves at least two further proteins: the periplasmic protein Rv0455, shown to be essential for mycobactin efflux [34], and an as-yet-unidentified outer membrane channel. In this context, the overall flexibility of the CCD - illustrated here by the tilted, incompletely formed conformation - may reflect the conformational dynamics required for interaction with these partner proteins, rather than being directly involved in forming a transport conduit. A complete mechanistic understanding will require identification of the missing components and structural characterization of the fully assembled efflux machinery.

      We do not think that the incompletely formed CCD represents a conformation that is relevant for transport. But it is a demonstration of the overall flexibility of the CCD, which may be required to further open the channel in case the substrates are transported within the CCD tube. Further in-depth experiments will be needed to clarify this interesting question, which is beyond the scope of this paper.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

      This is a great suggestion. We will include a 3D variability analysisin the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)<sub>3</sub>-(MmpL4) )<sub>3</sub> complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      We thank reviewer#3 for these useful comments, which we will address during the revision of the manuscript. In particular, we plan to include a scheme of an updated transport model.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

      We believe that there was a misunderstanding about our interpretation of the tilted CCD. As a matter of fact, it must be a stable intermediate, otherwise no density would have been observed for it in the cryo-EM maps. Despite being a stable intermediate, it is indeed unlikely that it represents a conformational state that is relevant/required for transport. Firstly, only the upright, complete CCD can bridge the periplasm. because . Secondly, the structure was determined in detergent and lacks additional protein binder partners, which might stabilize the upright conformation of the CCD . It is also conceivable, as the reviewer pointed out, that disulfide cross-linking may have caused the tilt. However, as we wrote in the manuscript, we do not think that cross-linking caused this striking asymmetry of the CCD, because the three MmpL4 and MmpS4 chains are basically symmetrical in the C1-processed data (see also Figure 2E):

      Line 182 ff: “To assess whether there are asymmetries in other parts of the structure, we superimposed the individual protomers of the (MmpS4)3-(MmpL4)3 complex analyzed using C1 symmetry (Fig. 2E). Apart from the two resolved α-helical hairpins, the MmpL4 core domains and the resolved parts of MmpS4 differ by a RMSD of less than 0.6 Å and are therefore structurally identical considering the map resolution of around 3 Å. The fact that the core domains of MmpS4 and MmpL4 do not deviate between the protomers argues against the possibility that the cross-links established between them cause the (asymmetric) tilt of the CCD.”

      Regarding the DDM binding site, we will indeed include an updated transport model. That said, we wish to be cautious, because we lack experimental proof that MmpL4 can in fact transport DDM.

    1. eLife Assessment

      Maloney et al. offer an important contribution to understanding the potential ecological mechanisms behind individual behavioral variation. By providing compelling theoretical and experimental data, the study bridges the gap between individual, apparently stochastic behavior with its evolutionary purpose and consequences. The work further provides a testable and generalizable model framework to explore behavioral drift in other behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2026) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are compelling, the connection between the experimental data (Fig. 1) and the modeling work (Fig. 2-4) is convincing.

      Weaknesses:

      In the experimental data (Fig. 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I had identified three significant issues with the experiments that were addressed in the revision:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not fully link to the theory-driven work in changing environments. A full experimental investigation of this would be beyond the scope of the current work.

      (2) The temporal aspect of behavioral instability has been addressed in Figure 1F.

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). This issue has been further discussed in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits are regulating behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting and in fact more fundamental than showing if it is serotonin that does it or not.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      Comments on the latest version:

      The changes to the manuscript sufficiently addressed my few comments. I do not have anything else substantial to add to my review and I am comfortable with my initial assessment.

    4. Reviewer #3 (Public review):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains in a Y maze concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift (although with some contrasting results). The use of a different assay for this different dataset (Y maze istead of wide arena) is justified by previous observation of similar behavioral biases in these assay. Yet the conceptual link between the spectral power analysis used for the first dataset and the autoregressive model used for the second remains unclear.

      Finally, the authors use theoretical approaches to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. eLife Assessment

      This paper describes useful findings on the effects of isoflurane anesthesia on the visual cortical circuitry of the mouse. It provides solid evidence that the visual spatial frequency sensitivity becomes coarser (lower resolution) during anesthesia, with distinct effects described in excitatory neurons, and parvalbumin (PV) and somatostatin (SOM) positive interneurons. This study should be of interest to neuroscientists studying the mouse visual cortex and the effects of anesthesia on cortical circuitry.

    2. Reviewer #1 (Public review):

      This manuscript characterizes the effects of isoflurane on visual processing in layer 2/3 of the mouse primary visual cortex (V1). General anesthesia, including isoflurane, has been reported to modulate various neural processes, such as size tuning, direction selectivity, and spatial selectivity in V1. Using two-photon calcium imaging, the authors monitored neural responses to visual stimuli under isoflurane anaesthesia and found that spatial frequency preferences are also affected across cell types, with the magnitude and direction of these effects varying between cell types.

      The authors performed careful and rigorous comparisons of neuronal responses between the two conditions using well-chosen nonparametric statistics. At the same time, because two-photon calcium imaging can be combined with cell-type-specific labeling, the authors labelled inhibitory neurons with tdTomato, allowing them to distinguish GCaMP activities in excitatory and specific inhibitory cell classes. We also appreciated that the manuscript provides not only summary statistics but also example GCaMP traces (Figure 1), which makes it easy for readers to understand the quality of the raw data.

      We believe that the manuscript could be improved by emphasizing the following three points.

      (1) The analyses are limited to the neurons that responded to visual stimuli in both the anesthetized and awake states. According to Table S1, the proportion of visually responsive neurons that met such criteria is only 27.4% for the excitatory neurons. This raises the potential concerns that the reported effects of isoflurane may not fully reflect population-level changes in visual coding. We suggest that the authors repeat the same analyses, including average tuning curves and decoding analyses, for all recorded neurons in each condition.

      (2) The manuscript would benefit from tuning curves of spatial frequency preference for individual neurons, as this would help readers assess whether the reported statistics are appropriate (Figures 2A-D). In addition, more in-depth single-neuron analyses would help distinguish between the two proposed hypotheses in Figure 5 that may not be evident from average responses alone. This is because, with the current analysis, it is not clear how the shape of the tuning curves will affect the estimation of spatial frequency preference. To address this potential concern and strengthen the interpretation of the results, we suggest:<br /> a) repeating the analysis at the level of individual neuronal responses, instead of average responses, and<br /> b) using simulated data to examine how changes in tuning-curve width could affect estimated spatial frequency preference.

      For example, using the neuronal responses in the awake condition, one could broaden the tuning curves and recompute the preferred spatial frequency, then compare the resulting distribution with that observed under anesthesia.

      (3) We believe the manuscript's overall framing is a little broader than what is directly supported by the data. In particular:

      (a) the statement "reduced sensory perception during anesthesia is linked to a degradation in spatial resolution at the cellular level" in the Abstract is an unclear and unsupported claim. We suggest removing this sentence and more directly summarizing the findings.

      (b) given the discrepancy between the effects of urethane and isoflurane as laid out in the discussion, the current title "Anesthesia Lowers Spatial Frequency Preference in the Primary Visual Cortex" appears overstated and should be revised to explicitly reflect the specific anesthetic tested: "Isoflurane Anesthesia Lowers Spatial Frequency Preference in the Primary Visual Cortex".

    3. Reviewer #2 (Public review):

      Summary:

      The main objective of the study was to link the changes in brain state due to anesthesia to consequences on visual neural processing, particularly effects on spatial frequency tuning. This is accomplished by 2-photon imaging of excitatory and inhibitory neurons (separating PV- and SST-positive subtypes) in mouse visual cortex during full-field visual stimulation with gratings, and tracking neuronal tuning for spatial frequency before, during, and after isoflurane anesthesia. The main finding is that anesthesia induces lower spatial frequency preferences in excitatory neurons, and this leads to poorer population representations (decoding) of higher spatial frequency responses during anesthesia. A second main finding is that anesthesia impacts inhibitory neuron subtypes in distinct ways, with the most pronounced effects of anesthesia on somatostatin inhibitory neurons.

      Strengths:

      (1) A main strength is that the study is that it is straightforward, and reassuringly, the results confirm multiple previous studies showing anesthesia's effects on the amplitude of cortical responses: larger and less selective responses in excitatory neurons (versus awake responses); strongly reduced responses in somatostatin inhibitory neurons (versus awake responses) (Fig. 5I-L), with less differences across anesthetized and awake states on response amplitude of PV neurons.

      (2) These confirmations of prior observations (on the amplitude of responses) establish good ground for the new results on spatial frequency tuning. For excitatory neurons, spatial frequency selectivity shifts to higher values in awake versus anesthetized conditions; this is because anesthesia induces larger responses to lower spatial frequencies. In somatostatin neurons, instead, wakefulness reduces the lower spatial frequency responses present in anesthesia, and dramatically increases the overall amplitude of responses and medium and higher spatial frequencies. This is consistent with prior work showing that in awake states, somatostatin neurons exert broad inhibition in V1; this study extends that finding to the tuning of spatial frequencies.

      Weaknesses:

      (1) A first weakness of the study is the lack of examination of changes to single neuron receptive field sizes and/or surround suppression across conditions, and how these may relate to the effects on spatial frequency tuning with full field gratings. There is a well-known relationship between the size of the receptive field and the resulting selectivity for spatial frequencies (i.e., large receptive fields prefer lower spatial frequency stimuli). Likewise, there are many studies showing how surround suppression / spatial integration is impacted by anesthesia (and arousal). A more detailed examination of all these related quantities on an individual neuron basis would provide a greater understanding of the factors underlying the effects on spatial frequency tuning. One could imagine that receptive field changes, and/or changes in surround suppression, influence the selectivity to full-field gratings.

      (2) A second weakness is the lack of examination/insight into the temporal dynamics of the effects. The experimental paradigm records activity across control, anesthesia, and recovery epochs in a single duration (~40 mins) session. The epochs are simply binned together ("Awake", "Anes.", "Recover"). It is not clear how the start of the anesthesia bin is defined, nor is it clear how the recovery period is defined. It is also not clear what the changes are to motor tone, brain state, etc., that are also strong influences on visual responses in mouse V1. Presumably, these onset/offset effects are similar enough across mice and sessions that they affect all the bins in the same way, but greater examination of the temporal effects in excitatory, PV, and SOM neurons could shed light on interactions driving the changes. Is there some temporal dependence of anesthesia on selectivity changes across the cell types? For example, at the onset of anesthesia, are SOM neurons losing broadband frequency responses before the excitatory neurons gain low frequency responses? Do PV neurons also show effects after the changes in SOM neurons (suggesting strong SOM -> PV inhibition)? Such analysis might shed light on the timing/causality of the effects among these 3 neuron types.

      (3) A third weakness concerns the interpretation of the low and high arousal conditions during awake states (Figure 6). It is not clear how movement (or lack of movement) impacts the high arousal epochs, nor is it clear how the low arousal condition compares to the brain state during anesthesia. For example, deep versus light anesthesia can lead to synchronized or asynchronous states, respectively, and low arousal in wakefulness can show strong low-frequency oscillations of activity, which could promote a lower excitability state than light anesthesia. Without some more detail about commonly measured brain state or body/face motion metrics, it is difficult to know what brain states are represented by the bins and how to interpret the comparisons.

      Overall, the study uses adequate methods and experimental design to demonstrate solid support for the (somewhat narrow) central finding that anesthesia lowers the spatial resolution of mouse V1 responses.

      Since this is a very well-examined topic, the findings here are not totally surprising, but confirmatory and slightly extend prior findings (a good thing). As such, the study will likely have most relevance to specialists in the mouse visual system, but if the study could address some of the remaining questions discussed above, this would potentially broaden the implications of the study to general insights about the operation of cortical circuitry.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript is focused on studying the spatial frequency selectivity of individual neurons in the mouse primary visual cortex (V1) in the anesthetized and awake brain states using 2-photon calcium imaging. Although previous studies have demonstrated that anesthesia decreases both size tuning and spatial selectivity in V1 neurons, the strength of this study is its focus on characterization of the same neurons in awake and anesthetized states in combination with transgenic mouse lines selectively labeling pan-inhibitory neurons and also more specific neuronal subtypes, including parvalbumin-positive (PV+) or somatostatin-positive (SOM+) interneurons. A combination of these methodologies allows for a more in-depth mechanistic study of the properties of different types of neurons. The main findings suggest that in excitatory neurons, anesthesia leads to a shift in preferred SF and broadening of SF tuning, with no changes in orientation and direction selectivity. Downward shift in preferred SF was more pronounced in both SOM+ and PV+ interneurons.

      Strengths:

      (1) 2-photon calcium imaging with single-cell resolution.

      (2) Characterization of excitatory and two types of inhibitory neurons.

      Weaknesses:

      (1) VIP interneurons are critical to the neural circuit, and their characterization would be critical to the mechanistic understanding of this process, but is missing.

      (2) Unfortunately, the manuscript does not lead to an additional insight into the nature of this anesthesia-induced shift in SF preference.

      (3) Furthermore, it also doesn't help understand how SF preference is encoded in V1.

      (4) Finally, some critical histological controls are missing.

    5. Author response:

      Thank you for the eLife assessment and the constructive reviews. We appreciate the reviewers’ valuable insights and the time they dedicated to providing such thoughtful feedback on our manuscript. The reviewers highlighted the technical rigor of our study, specifically the tracking of individual neurons across both anesthetized and awake states using two-photon imaging. They also emphasized the importance of our cell-type-specific analysis (excitatory, PV, and SOM neurons) and noted that the study provides solid evidence for isoflurane-induced shifts in preferred spatial frequency (SF).

      Based on our team's evaluation of the reviewers' comments, we would like to outline our planned revisions.

      (1) Expanded Population and Single-Neuron Analysis

      We will re-analyze our dataset to include all neurons that were responsive under anesthesia, in the awake state, or both. This will ensure our findings accurately represent the entire population of visually responsive neurons. We will also provide examples of individual tuning curves to clarify the relationship between tuning shape and SF shifts in individual neurons.

      (2) Addressing Methodological Scope and Behavioral Metrics

      Receptive Field Size and Dynamics: While we did not utilize a stimulus set specifically designed to map receptive field (RF) sizes, we intend to examine how other functional parameters co-varied with the shift in preferred SF within each cell type. Furthermore, although characterizing the precise temporal dynamics during anesthesia onset presents technical challenges, we will attempt to analyze the time-dependence of the observed changes to provide deeper insight into the transition between states.

      Behavioral Metrics: While pupil size is a well-established proxy for brain state, we will explore the inclusion of other available behavioral parameters.

      (3) Cell-type Specificity (SOM, PV, and VIP)

      SOM vs. PV Comparison: We will perform a detailed comparison of preferred SFs between SOM and PV interneurons, including those responsive only under anesthesia or only in the awake state.

      VIP Neurons: While VIP neurons are known to play critical roles in cortical circuits, such as disinhibition, we have decided not to conduct new recordings for VIP interneurons in the present study. Based on existing literature, the proportion of visually responsive VIP cells is too low to yield statistically reliable conclusions for this specific study (de Vries et al., Nature Neuroscience 23, 138-151, 2020). Additionally, we intend to focus our analysis on inhibitory interneuron subtypes that provide direct input to pyramidal cells.

      Histology: We will provide additional histological validation.

      (4) Refined Framing

      As suggested, we will focus the manuscript strictly on isoflurane anesthesia. This includes updating the title and abstract to reflect this specificity and discussing how our results compare with other anesthetics like urethane. Furthermore, we will substantially deepen our discussion on the potential mechanisms by which anesthesia induces a downward shift in preferred spatial frequency.

      We believe these additions will significantly strengthen the manuscript.

    1. eLife Assessment

      This important study experimentally probes potential antibiotic activity against hypothetical "mirror bacteria" with reversed chirality, showing that D-enantiomers of several approved antibiotics largely lack activity against natural bacteria (as a proxy for mirror organisms) and that conjugated D-peptides can elicit strong binding antibody responses in mice when adjuvanted. The evidence is solid for these core observations but incomplete on issues of chiral purity, functional antibody assays, replicates, and pharmacodynamic readouts; the work also overreaches in extrapolations without deeper mechanistic integration or native-format validation. Overall, the work offers a cautious, relevant contribution to mirror microbiology discussions and will interest infectious disease researchers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Evaluation of Antibiotic and Peptide Vaccine Strategies for Mirror Bacterial Infections" addresses a topic that is well established in the literature. The authors investigate the activity of enantiomeric (D-form) antibiotics against bacteria and the immunogenicity of D-form peptides, proposing that D-enantiomers are ineffective both as antibacterial agents and as vaccine candidates. While the subject matter is relevant, the concepts explored are already well known, and the manuscript offers limited novelty.

      The authors demonstrate that D-enantiomeric antibiotics lack antibacterial activity compared to their naturally occurring L-forms and that D-form peptides fail to elicit detectable immune responses. These observations are consistent with existing knowledge regarding molecular chirality in biological systems. However, the manuscript relies on a limited experimental dataset while extrapolating the findings broadly, which weakens the strength of the conclusions.

      Strengths:

      The manuscript introduces the topic of Mirror Bacterial Infections, likely to occur if no regulations or restrictions are placed immediately.

      The manuscript addresses a relevant topic and has potential value, particularly in framing discussions around chirality and pathogen interactions. With a more cautious interpretation of the results, the manuscript could better justify its conceptual framework and strengthen its contribution to the field.

      Weaknesses:

      (1) Several sections of the manuscript are overly descriptive and would benefit from deeper comparative analysis and critical synthesis. In multiple instances, the discussion relies on hypothetical scenarios supported primarily by selective citations rather than robust experimental evidence. The introduction of the term "mirror microbiology" or "mirror bacteria" appears largely conceptual and is used to unify what are essentially two separate lines of investigation, enantioselective antibiotic activity and peptide chirality in immune recognition, without sufficient mechanistic integration.

      (2) To the best of this reviewer's understanding, the manuscript does not present substantial novelty. The pronounced differences in biological activity between L- and D-forms of small molecules and peptides are well documented, including their implications for antimicrobial efficacy and immune recognition. While the manuscript is written in clear and accessible language suitable for both specialists and interdisciplinary readers, novelty remains limited.

      The manuscript reiterates well-established principles of stereochemistry and biological recognition. Given the extensive existing literature demonstrating that enantiomeric antibiotics are typically inactive due to stereospecific target interactions, the failure of D-form antibiotics is expected and does not constitute a novel finding.

      (3) Critical experimental details are lacking, particularly regarding the peptide design. It is unclear whether the peptides were synthesized entirely in the D-configuration or whether only select amino acids were substituted. This distinction is essential for interpreting immunogenicity results and for comparison with prior studies.

      (4) The authors conclude that D-form peptides are poorly recognized by the immune system. However, the data presented indicate that neither the L- nor the D-form peptides tested elicited a measurable immune response. Without demonstrating immunogenicity of the corresponding L-form peptides, the conclusion that immune non-recognition is specific to the D-form is not sufficiently supported.

    3. Reviewer #2 (Public review):

      This paper by Kleinman et al. tackles an increasingly discussed biosecurity scenario, namely the possibility that "mirror bacteria" could evade key elements of host immunity and therefore demand bespoke medical countermeasures. The authors experimentally probe two such countermeasure concepts: (1) whether existing chiral antibiotics might still work against mirror bacteria (this is tested indirectly by measuring the activity of antibiotic enantiomers against natural-chirality bacteria), and (2) whether D-peptide antigens can be made immunogenic. Briefly, the authors show that enantiomers of four approved antibiotics have little to no activity in MIC assays, argue this implies the parent drugs would likely fail against mirror bacteria, report limited single-dose tolerability data for the enantiomers in mice, and show that selected bacterially derived D-peptides can elicit strong binding antibody titers when conjugated to a carrier protein and given with adjuvant.

      Overall, the study is quite interesting but constrained by the fact that D-peptide immunogens and related ideas have been explored for decades, by prior literature showing that D-enantiomeric peptides can themselves be strongly antimicrobial vs conventional bacteria, and by a number of conceptual and experimental limitations outlined below.

      (1) A blanket statement indicating that flipping chirality makes antibiotics ineffective cannot be true across all classes. Indeed, there is extensive precedent for "mirror" (D-amino-acid) peptides that retain, or even improve, antimicrobial activity against natural bacteria.

      (2) The paper's key claim ("parent antibiotics won't work on mirror bacteria") is based on the observation that the enantiomers of chloramphenicol/linezolid/tedizolid/aztreonam largely lose activity against natural bacteria. This is a reasonable proxy experiment given the absence of mirror organisms, but it remains an inference and should be described as such.

      (3) The chiral purity needs to be documented more rigorously. The methods mention structural confirmation by NMR and >95% purity by LC-MS/HPLC for enantiomeric compounds, but this is not the same as demonstrating high enantiomeric excess or excluding low-level contamination by the active parent enantiomer.

      (4) The residual activity of ent-aztreonam is quite interesting. The authors report slight activity for ent-aztreonam (MIC of 32-128 µg/mL in a subset), still far weaker than aztreonam but nonzero.

      (5) For antibiotics, MIC is a starting point, but further experiments are needed. To justify countermeasure relevance, it would help to include at least one additional pharmacodynamic readout (time-kill kinetics, post-antibiotic effect, inoculum effect, or activity in the presence of human serum).

      (6) The acute toxicity study is limited (single-dose, short follow-up, small n, one sex/strain, and no histopathology).

      (7) The Discussion leans on human equivalent dosing logic to reassure feasibility. Given the lack of PK, bioavailability, metabolism, and repeat-dose data, these comparisons risk overreach.

      (8) The readout is ELISA endpoint binding (IgG; and IgA in BALF for one antigen), which is fine for an initial immunogenicity screen. But the manuscript then drifts toward "vaccine strategy" claims without showing any antibody functionality (opsonophagocytosis, complement deposition, neutralization, blocking adhesion, and so on) or even binding to a more native-like antigen format (e.g., D-peptide displayed on particles; D-protein fragments; or any surrogate that goes beyond plate-bound peptide).

      (9) The methods report peptide conjugates containing ~10-200 EU/mL endotoxin. That is not trivial and could materially amplify immunogenicity, and should be discussed.

      (10) The authors should report how many technical/biological replicates were performed for MIC determinations and for ELISAs.

    4. Reviewer #3 (Public review):

      Summary:

      There is a threat of mirror life bacteria, which could possibly evade immunity and cause problems for human/animal hosts. This paper evaluates enantiomeric antibiotics and vaccines as a means to understand how this could be combatted in the future.

      Strengths:

      It is valuable to collect such information, as it is not always clear how an antibiotic in its enantiomeric form would interact with a bacterium in terms of its MIC or towards toxicity. The paper is scientifically sound with regard to assays and statistical methods.

      Weaknesses:

      The beginning of the paper could be described as hyperbolic. For a paper that demonstrates that mirror-image molecules have (expected) lower MICs and toxicity, some of the claims in the beginning that they are going to cause a pandemic of evading the immune system seem to be a bit overstated. If they are mirror images, how are these bacteria going to generate virulence factors or mediate pathogenesis mechanisms? It seems like the lack of adaptation would go both ways - supported by the empirical data gathered in this manuscript. There is also the issue of only relatively simple and accessible mirror-image antibiotics being available. This is a limitation that - to their credit - the authors do discuss in the discussion section.

    1. eLife Assessment

      AIRE has been well known to contribute to immune self-tolerance in the thymus by expressing auto-antigens; in this manuscript, the authors describe unexpected findings about the interaction of AIRE with AID in B cells, and its function in the immune system, thereby contributing to a fundamental understanding of the broader functions of AIRE. The strength of this manuscript is that, by employing biochemical and genetic experiments, the authors convincingly show interaction between AIRE and AID and subsequent AIRE's function in the GC responses. However, two weak points exist: first, the connection between AIRE, auto-anti IL17 Abs, and IL17-positive effector T cells, and second, like the thymus, expression of auto-antigens by AIRE in the GC B cells has not been tested.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide in vivo and in vitro evidence for an interaction between AIRE and AID. This has implications for the dynamics of the germinal center response and autoimmunity related to the APSI disease.

      The manuscript describes an unexpected function of AIRE, which is more well known for its function to regulate negative selection of T cells in the thymus. Here, the gene has also been shown to be expressed by B cells (Immunity 2015: 26070482). They describe that AIRE interacts with AID, and in its absence, B cells acquire more hypermutations and also produce auto-antibodies against IL-17. These autoantibodies have been described previously.

      Strengths:

      The study is interesting and provides some additional information about how AIRE regulates immune cell function. Several biochemical and in vivo experiments show the interaction and the function of AIREs in the regulation of AID activity in the GC response.

      Weaknesses:

      Some of the hypothetical consequences of this regulation are not investigated. This includes responses to model antigens and dynamics of the germinal center related to kinetics.

      Major Comments:

      (1) AID regulates both switch and somatic hypermutation. Switch is easier to achieve, so which of these processes does AIRE influence the most? Also, the switch is thought to occur before the B cell enters the GC. Looking at the histology, is AIRE also expressed at the early proliferative stage that has been described by Ann Haberman?

      (2) In experiments determining anti-CD40-dependent upregulation of AIRE, naïve resting B cells were used from mice. A proportion of the B-cells got activated. Are these MZB or FOB cells as MZBs are more easily activated?

      (3) In the BM chimeric experiments in Figure 3. Do the AIRE+ and AIRE - populations distribute equally among B cell subpopulations?

      (4) Furthermore, in the NP-KLH experiments, one would expect that B cells with increased affinity would leave the GC earlier and become plasma cells. Thus, the kinetics of the AIRE+ vs AIRE- B cells within the GC would be different? Also, would they maybe take over at some point, as the increased affinity would favor help from Tfh cells that are known to be limited?

      (5) Given the previous studies on AIRE's function in regulating transcription (PMID: 34518235), how does this interaction fit into this picture?

      (6) In the uracil experiments, the readout for AID to induce double-stranded breaks could be tested.

      (7) The candida experiments are a nice connection to the situation in patients. However, why is it mostly auto-antibodies against IL-17? How about other immune responses, as well as T cell-independent type I and II responses?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Zhou et al investigated the expression and function of AIRE in B cells in peripheral lymphoid tissues. First, they found the expression of AIRE protein in mature B cells in the follicles in human tonsils and spleens from healthy donors. Flow cytometry analyses using human samples as well as Aire-reporter mice demonstrated AIRE expression in germinal center B cells. The expression of Aire in B cells was induced by CD40 signals. Then, to investigate the impact of AIRE deficiency on B-cell function, the authors used a method of transplanting bone marrow cells from Aire-KO and WT mice into B-cell-deficient mice, comparing B-cell development and function reconstituted in the recipient mice. Their results showed that Aire-deficient B cells strongly responded to immunization with antigens, exhibiting enhanced class switching and somatic hypermutation of antibodies compared with WT B cells. The same phenomena were observed in CRISPRed B cell lines lacking Aire. The authors successfully utilized the Aire-deficient B cell line to demonstrate that Aire suppresses antibody class switching and somatic hypermutation via its interaction with AID. Finally, using B cell transfer into B cell-deficient mice demonstrated that mice harboring Aire-deficient B cells produced high levels of autoantibodies against Th17 cytokines and exhibited reduced resistance to Candida infection. This mirrors characteristic symptoms in AIRE-deficient patients. The findings of this study not only reveal an unexpected function of AIRE in B cells but also have the potential to contribute to understanding the pathogenesis of APECED and to offering a new direction for developing therapies.

      Strengths:

      The strength of this study lies in demonstrating the expression of the function of AIRE in B cells in both mice and humans. It also revealed the direct interaction between AIRE and AID, along with its binding mode (requiring CARD and NLS domains of AIRE), and showed that this interaction is crucial for AIRE function in B cells. It is also significant that the study demonstrated how B-cell-intrinsic dysfunction of AIRE leads to autoantibody production against cytokines.

      Weaknesses:

      As for loss-of-function analysis of Aire in B cells, in addition to the B cell transfer from Aire-KO mice performed in this study, generating B cell-specific Aire-deficient mice using Aire-flox mice (Dobes et al, Eur J Immunol 2018) would further reinforce the conclusions of this study. Furthermore, the relationship with Aire function in thymic B cells reported by previous studies remains unclear, posing an unresolved challenge. This study also failed to address whether Aire deficiency affects gene expression in GC B cells, in particular, whether it induces the expression of various self-antigens as reported in thymic B cells or mTECs.

    1. eLife Assessment

      This study used pupillometry to provide an objective assessment of a form of synesthesia in which people see additional color when reading numbers. It provides convincing evidence that subjective color ratings are matched by changes in pupil size that recapitulate brightness-mediated changes when exposed to the real color. The work provides a valuable contribution to the literature on both synesthetic perception and the use of pupillometry to probe perception and related psychological processes.

    2. Reviewer #1 (Public review):

      Summary:

      Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a black-white character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.

      Strengths:

      The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.

      Weaknesses:

      Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.

      (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.

      (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).

      (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.

      (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.

      Impact:

      Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.

    3. Reviewer #2 (Public review):

      Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individuals (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia has continued to attract a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.

      Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict, and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and put forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").

      Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show, for example, that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.

      Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.

    4. Reviewer #3 (Public review):

      Summary:

      In the present study, the authors examined pupillary responses to uncolored stimuli (number graphemes) among number-color synesthetes and non-synesthetes. After seeing a digit, the synesthetes and active control participants were asked to indicate which color they perceived using three dimensions of hue, saturation, and lightness. The lightness values were the primary independent variable for follow-up analyses. To see how the pupil responded to psychologically "bright" and "dark" digits, the authors split the reported lightness values at the median and plotted them. The synesthetes showed a pupillary constriction to digits they perceived as bright and dilation to digits they perceived as dark. Active control participants did not show that effect. In a subsequent block, only the synesthetes were shown the colors they reported perceiving as colored discs. Their pupillary responses were similar. The authors also found that the differences in pupillary responses between light and dark perceptions (with digits) were only slightly delayed in their onset to the perception of a colored disc, and therefore, the color perception accompanying a digit is unlikely to be effortful or a retrieved association, but occurs rather automatically.

      Strengths:

      The authors employed a well-controlled and designed quasi-experiment comparing color-grapheme synesthetes to non-synesthetes and showed convincingly that the color perceptions accompanying graphemes alter the physical perception of brightness. They also made a reasoned attempt to rule out the possibility that color associations are occurring effortfully via retrieved associations.

      Weaknesses:

      There are some areas in which the implications of these findings could be elaborated upon. I had the following questions:

      (1) Are the pupillary responses among synesthetes, which objectively do not seem to match the degree of physical stimulation entering the retina, in any way maladaptive for eye functioning? I understand the constriction/dilation of the pupil to not only benefit visual acuity but also to protect the retina from damage. Are synesthetes at any risk of retinal damage due to over-dilation of the pupil to brighter stimuli? Or are these effects of a magnitude that is too small to matter? As reported in arbitrary units, it was hard to know how large these effects were in terms of measurable changes in dilation (e.g., millimeters).

      (2) Likewise, is the automatic synesthetic merging of two percepts something that could be learned such that natural synesthetes and "artificial" synesthetes would look similar? For example, if a group of non-synesthetic participants were to learn a color-grapheme association to automaticity, would you expect their pupillary responses to the graphemes look similar to the synesthetes'? If so (or if not), what would this tell us anything about the phenomenology of synesthesia?

      (3) Do the synesthetic perceptions of digit graphemes merge in a sensible way? For example, if a synesthete sees a particular color with the digit 1, and a different color with the digit 9, what do they perceive when they see 19? or 1-9, or 1 9? Is there color blending, or an altogether different color perception?

    1. eLife Assessment

      This study provides an important, comprehensive, large-scale dataset on transcription factor binding in Pseudomonas aeruginosa, along with analyses of its regulatory network, key virulence and metabolic regulators, and a pangenomic examination of transcription factors. Utilizing large-scale ChIP-seq and multi-omics integration, the research convincingly supports the hierarchical regulatory structures and offers insights into virulence mechanisms. This dataset, made available through an online database, should be an invaluable resource to the research community studying P. aeruginosa, a key pathogen at risk for hospital infections and development of antibiotic resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain under-evaluated. But this could be improved in the future.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Comments on revisions:

      The authors have done a good job of revising their manuscript. The manuscript is now more concise and logical for readers.

    3. Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. Experimental validation has been included in the revised version. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Weaknesses:

      Drawbacks of the study, which have been mitigated in a revised version, include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions (discussed by the authors), and 2) remaining challenges in the practical utilization of the TRN topological analysis.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Thank you for your positive feedback!

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      Thank you for this valuable suggestion. We have performed the EMSA experiment to validate the binding result and also constructed the mutants for further functional validation. The details can be found in Figure S5.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      Thank you for this insightful comment regarding ChIP-seq data quality and non-promoter binding events. While we acknowledge that completely eliminating all non-specific binding signals is technically challenging in bacterial ChIP-seq experiments, we implemented stringent quality control measures including replicates, negative controls, and FDR cutoffs to minimize false positives.

      Although the coding binding peaks represent a smaller fraction of total binding events, they are functionally significant rather than mere technical artifacts. Our previous work systematically demonstrated that bacterial TFs can bind to coding sequences and regulate gene expression through multiple mechanisms, including modulating cryptic promoter activity and antisense RNA transcription, hindering transcriptional elongation, and influencing translational efficiency[1]. We have now expanded the Discussion section to address these regulatory mechanisms.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Thank you for this constructive feedback on PATF_Net. We acknowledge that more advanced features would further enhance the platform’s utility. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Thank you for your positive feedback! We have added experimental validation in the Results section.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Thank you for your positive feedback!

      Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Thank you for your positive feedback!

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

      We thank the reviewer for summarizing these key concerns. We acknowledge the limitations raised regarding TF overexpression, TRN topological analysis interpretation, and experimental validation. We provide detailed point-by-point responses to each of these concerns in our replies to the specific comments below, where we explain our rationale, the measures taken to address these limitations, and our plans for improvement.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Future Directions for the authors to consider for next steps:

      (1) Key TFs (e.g., PA1380, PA5428) should be validated via gene knock out experiments, fluorescent reporter assays, or animal models to confirm roles in virulence pathways.

      Thank you for this important suggestion. We agree that experimental validation is essential to confirm their regulatory roles and biological functions.

      Firstly, we selected a subset of key TFs, including PA0167, PA1380, PA0815, and PA3094, and performed Electrophoretic Mobility Shift Assays (EMSA) experiments to validate their direct binding to target promoters. These results confirmed the ChIP-seq-identified interactions and are now included as Figure S5A-F.

      We also constructed a clean deletion mutant of PA1380 and PA 3094 (ΔPA1380 and ΔPA3094) and their complementary strains (ΔPA1380/p and ΔPA3094/p). We then performed RT-qPCR analysis to validate their regulatory effects on key target genes. We found that PA1380 positively regulate the expression of cupB1 and cupB3 genes (Figure S5F). While the CupB cluster was known not be as important as CupA cluster in the biofilm information, so we did not find significant difference in biofilm formation between WT and ΔPA1380. Additionally, we found TF PA3094 also positively regulate lecA expression, which were shown in Figure S5G.

      We agree that comprehensive functional validation, including animal model studies, would further strengthen the biological significance of these findings. Such experiments are currently underway in our laboratory and will be the subject of follow-up studies.

      We have revised the Results section and Method section to include these validation experiments and their implications. Please see Figure S5 and Lines 283-300.

      “To experimentally validate the regulatory interactions identified by ChIP-seq, we performed biochemical and genetic analyses on selected TFs. First, we conducted Electrophoretic Mobility Shift Assays (EMSA) for four TFs, including PA0167, PA0815, PA1380, and PA3094, using DNA fragments containing their predicted binding sites from target gene promoters. These TFs showed specific binding to their cognate DNA sequences (Figure S5A-D), confirming the direct binding of the ChIP-seq-identified interactions.

      To further validate the functional regulatory roles of these TFs, we constructed clean deletion mutants of PA1380 and PA3094 (ΔPA1380 and ΔPA3094) along with their complemented strains (ΔPA1380/p and ΔPA3094/p). RT-qPCR analysis revealed that PA1380 positively regulates the expression of cupB1 and cupB3 (Figure S5E), two genes within the CupB fimbrial cluster identified as ChIP-seq targets. Similarly, PA3094 was confirmed to positively regulate lecA expression (Figure S5F), which encodes a lectin involved in biofilm formation and host interactions[2]. Expression of these target genes was restored to wild-type (WT) levels in the complemented strains, validating the regulatory relationships predicted by ChIP-seq. These combined biochemical and genetic validations demonstrate the accuracy and biological relevance of our TF binding data.”

      (2) Non-promoter binding events (e.g., coding regions) may regulate RNA stability, warranting integration with translatomics or epigenomics data.

      Thank you for this suggestion. We have now expanded the Discussion section to address this comment. Please see Lines 478-482.

      “Our analysis revealed that TF binding events occur within coding regions, which is consistent with our previous study demonstrating that bacterial TFs possess binding capabilities for coding regions and can regulate transcription through multiple mechanisms [1]. Besides, it may also regulate RNA stability, warranting integration with translatomics or epigenomics data.”

      (3) Incorporate strain-specific TF data (e.g., clinical isolates) and dynamic visualization tools to broaden PATF_Net's applicability.

      Thank you for this constructive suggestion. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query. These features are now live on the database and described in the revised manuscript.

      Regarding strain-specific TF data, we agree this would be valuable for understanding regulatory diversity in clinical isolates. However, such an expansion would require ChIP-seq profiling across multiple strains. The current dataset is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa research and allows direct comparison with existing genomic and functional studies. We have added a statement in the revised manuscript acknowledging this limitation and highlighting strain-specific TF analysis as an important future direction for the field. Please see Lines 372-390.

      “The database offers multiple search modalities to facilitate data exploration: users can perform TF-centric searches to query binding sites, target genes, and regulatory networks for individual TFs, or utilize the target gene search function to identify all TFs that regulate any gene of interest by entering its locus tag. To connect regulatory data with biological function, we have implemented a virulence pathway browser that allows users to explore TF binding patterns across curated gene sets for major P. aeruginosa virulence pathways. Interactive visualization tools, including network graphs and binding profile plots, facilitate intuitive exploration of regulatory relationships. The primary purpose of PATF_Net is to store, search, and mine valuable information on P. aeruginosa TFs for researchers investigating P. aeruginosa infection. The current resource is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa molecular studies and allows direct integration with existing genomic annotations and functional data. However, P. aeruginosa exhibits substantial genomic diversity across clinical isolates, and strain-specific differences in TF binding patterns may contribute to phenotypic variation in virulence, antibiotic resistance, and host adaptation. Extension of this resource to include strain-specific regulatory maps from diverse clinical isolates would provide valuable insights into the regulatory basis and represents an important direction for future investigation.”

      (4) Phylogenetic analysis highlights TF conservation in bacteria; future work could explore functional homology in other Gram-negative pathogens (e.g., E. coli).

      Thank for this insightful suggestion. Our phylogenetic analysis revealed that P. aeruginosa TFs exhibit varying degrees of conservation across bacterial species, with some showing broad distribution across Gram-negative pathogens while others are lineage-specific.

      We agree that exploring functional homology of orthologous TFs across species would be highly valuable. Such comparative studies could address whether conserved TFs regulate similar target genes and biological processes across species, or whether regulatory networks have been rewired during evolution. For example, comparative ChIP-seq analysis of P. aeruginosa TFs and their orthologs in Klebsiella pneumoniae or even Gram-positive pathogen like Bacillus cereus could reveal conserved regulatory modules governing universal virulence or metabolic strategies versus species-specific adaptations. This represents an important direction for future investigation and would be facilitated by the comprehensive TF binding dataset we provide here. We have expanded the Discussion section to highlight this future direction. Please see Lines 539-550.

      “While our phylogenetic analysis reveals varying degrees of TF conservation across bacterial species, the functional implications of this conservation remain to be fully explored. Many P. aeruginosa TFs have clear orthologs in both Gram-negative (e.g., Klebsiella pneumoniae) and Gram-positive pathogens (e.g., Bacillus cereus), yet whether these orthologs regulate similar target genes and biological processes is largely unknown. Future comparative ChIP-seq profiling of orthologous TFs could reveal the extent to which regulatory network architecture is conserved versus rewired during bacterial evolution, potentially identifying core regulatory modules governing universal bacterial strategies versus species-specific innovations. Such cross-species comparisons would enhance our understanding of regulatory network evolution and enable functional prediction in less well-characterized pathogens based on homology to experimentally validated P. aeruginosa regulators.”

      Reviewer #3 (Recommendations for the authors):

      Major comments

      - Limitations of the ChIP-seq approach: With overexpression plasmids as an approach to TRN elucidation, there are always a set of concerns. First, TF expression is not enough to ensure regulatory activity - metabolite effects must be such that the TF is active which requires growing the cells in activating conditions. Second, the presence of a binding event does not mean that the binding has a regulatory effect - the authors are clearly aware of this as they specify binding sites in promoter regions, which should be helpful, but they also mention the possibility of regulatory binding events in coding regions. These issues should be listed as weaknesses of the approach in the Discussion.

      Thank you for these important suggestions. We agree that these limitations should be explicitly discussed. We have now added a dedicated paragraph in the Discussion section addressing these concerns. Please see Lines 492-501.

      “However, several limitations of the ChIP-seq approach should be acknowledged. Firstly, TF overexpression ensures sufficient protein levels for ChIP-seq signal detection but does not guarantee that all TFs are in their active conformational states, as many bacterial TFs require allosteric activation by metabolites, cofactors, or post-translational modifications. The cells under standard laboratory conditions which may not activate all TFs to their maximal regulatory states, potentially leading to underestimation of condition-specific binding peaks. Secondly, while we observed TF binding at thousands of genomic sites, binding per se does not equate to functional regulation, as chromatin context, cofactor availability, and competitive binding all influence regulatory outcomes.”

      - Lack of independent validation: The study seems to lack substantial independent validation of either the functional nature of the binding sites as well as the proposed physiological regulatory role of the TFs. For example, for the 103 identified TF motifs, do any of these agree with existing motifs in motif databases that may be homologous to P. aeruginosa TFs? The authors claim to have discovered master regulators of virulence and associated core regulatory clusters - but there does not seem to be any independent validation of the proposed associations. The authors selected the TF targets to cover TFs that had not yet been characterized; however, it would have been nice to have some overlap with previous studies so that consistency and data quality could be assessed.

      Thank you for raising these critical points about validation.

      As for motif validation, we compared the existing motifs in the RegPrecise database[3] and we found that the motif of PA3587 show significant similarity to homologous TFs in Pseudomonadaceae. We have added the related description in the Results section. Please see Figure S3B and Lines 228-231.

      As for the validation of master regulators, we have performed EMSA experiments for validating the binding events and constructed the mutants for function validation. We have added the related contents in Results section. Please see Figure S5 and Lines 283-300.

      We have discussed the overlap between our results and previous studies in the Discussion section. Please see Lines 530-538.

      “PA0797 is known to regulate the pqs system and pyocyanin production[4]. In the present study, it was also found to bind to the pqsH promoter region and its motif was visualised. PA5428 was found to bind to the promoter regions of aceA and glcB genes[5], which was also demonstrated in our ChIP-seq results. PA4381 (CloR) was found to be associated with polymyxin resistance in a previous study[6] and to be possibly related to ROS resistance in the present study. Furthermore, PA5032 plays a putative role in biofilm regulation and also forms an operon with PA5033, an HP associated with biofilm formation[7].”

      - Uncertain value of TRN topology analysis: The relationship between ternary motifs and pathogenicity of P. aeruginosa, and why the authors argue these results motivated TF-targeting drugs (the topic of the last paragraph of the Discussion), are unclear to me. The authors allude to possible connections between pathogenicity, growth, and drug resistance, but I don't see concrete examples here of related TF interactions that clearly represent these relationships. The sections "Hierarchical networks of TFs based on pairwise interactions" and "Ternary regulatory motifs show flexible relationships among TFs in P. aeruginosa" seem to not say much in terms of results that are actionable or possible to validate. A topological graph is constructed based on observed TF-TF connections in measured binding sites - however, it's unclear if any of these connections are physiologically meaningful. Line 178 - Why would there be any connection between the structural family of TF and its location in the proposed TRN hierarchy?

      Thank you for this valuable comment on TRN topology analysis. It is hard to quantify precisely how much this resource will accelerate P. aeruginosa research or drug development, but we believe providing this foundational network architecture has inherent value for the community, which is valued for enabling hypothesis generation even before comprehensive functional validation. We would like to clarify our perspective on these findings and have added the discussion in the revised manuscript to better describe their nature and value. Please see Lines 517-528.

      “Additionally, although the TRN analysis revealed organizational patterns in P. aeruginosa regulatory network, the functional significance these topological features, including their specific contributions to pathogenicity, metabolic adaptation, and antibiotic resistance remains to be experimentally determined in the future work. The hierarchical structure and regulatory motifs we identified represent objective network properties derived from our binding data, but translating these structural observations into mechanistic understanding will require condition-specific functional studies, genetic validation, and phenotypic characterization. Our analysis provided a systematic framework and generating testable hypotheses rather than definitive functional conclusions. Nevertheless, these network-level organizational principles provided value to the community as a foundational reference, similar to other regulatory network maps[8] that were useful even before comprehensive validation.”

      - Identification of "master" regulators: Line 527 on virulence regulators: "We first generated gene lists associated with nine pathways" - is this not somewhat circular, i.e. using gene lists generated from (I assume) co-regulated gene sets to identify regulators of those gene lists? I can't tell from the cited reference (80), which is their own prior review article, what the original source of these gene lists was. Somewhat related to this point - Line 32: 24 "master regulators" - if there are that many, is it still considered a master regulator? Line 270: This term "master regulator" would seem to require some quantitative justification. Identifying 24 (a large number of) "master" regulators of virulence would seem to dilute the implied power of the term.

      We apologize for the lack of clarity regarding the virulence pathway gene lists, and we have provided complete gene lists for virulence-related pathways, which were compiled from functional annotations, in our online PA_TFNet database.

      Additionally, we appreciate your concern about the use of “master” regulator. The usage is based on previous studies[9,10], and the master regulator is commonly known in the development of multicellular organisms as a subset of TFs that control the expression of multiple downstream genes and govern lineage commitment or key biological processes. We employed the term "master regulator" in an analogous manner to specify a class of functionally crucial TFs that participate in a pathway or biological event by regulating multiple downstream genes statistically enriched in that pathway. In line with this definition, we identified TFs whose targets were significantly enriched in genes associated with specific virulence pathways (hypergeometric test, P < 0.05).

      We understand the concern that identifying 24 master regulators might seem to dilute the term. However, we would like to clarify that each of these 24 TFs is a "master regulator" with respect to specific virulence pathways based on statistical criteria, not necessarily a global master regulator of multiple pathways of P. aeruginosa. We have revised the Method section. Please see Lines 604-612.

      - Line 234: "Genome-wide synergistic co-association of TFs in P. aeruginosa." This section was an interesting analysis. As I mention above, the weakness of an overexpression approach is not knowing whether the TF is active on the examined conditions. By looking at shared binding peaks across overexpression of different TFs, it should indeed be possible to glean some regulatory connections across TFs. Furthermore, the authors discuss specific examples that appear physiologically reasonable, which is appreciated.

      We thank the reviewer for this positive assessment of our co-association analysis. We agree with the limitation of the overexpression approach, which have been discussed in the Discussion section. We are pleased that the reviewer found the approach and specific examples valuable.

      Minor comments

      - Line 35 - "high-throughput systematic evolution of ligands by exponential enrichment" - no idea what this means. Is this related to the web-based database, or why is it mentioned in the same sentence?

      We apologize for the unclear presentation. To clarify: “High-throughput systematic evolution of ligands by exponential enrichment” (HT-SELEX) is an in vitro technique for determining TF DNA-binding motifs, which our group previously applied to a subset of P. aeruginosa TFs in a prior publication[11]. In the current study, we performed ChIP-seq for 172 TFs, which represent the majority of TFs not covered by the previous HT-SELEX study. Together, these two complementary approaches (HT-SELEX for in vitro binding motifs, ChIP-seq for in vivo genomic binding sites) provide near-complete coverage of the P. aeruginosa TF repertoire. Both datasets are integrated into our PA_TFNet database.

      Due to space constraints in the abstract, we could not provide detailed explanation of HT-SELEX, but we have now improved the clarity in the Introduction to better explain the relationship between our previous HT-SELEX work and the current ChIP-seq study, and why both are mentioned together in the context of the database. Please see Lines 99-105.

      - Line 193 - Only 9 auto-regulating TFs seems like a low number, given the frequency of negative auto-regulation in other organisms like E. coli. Could the authors comment on their expectations based on well-curated TRNs?

      Thank you for this comment. We agree that 9 auto-regulating TFs is lower than might be expected based on E. coli, where auto-regulation is more prevalent. This likely reflects technical limitations of ChIP-seq approach that our detection was limited to standard growth conditions rather than the diverse physiological states where auto-regulation often occurs. Therefore, the 9 TFs we report represent a high-confidence subset, and the true frequency of auto-regulation in P. aeruginosa likely is higher. We added the content in the revised manuscript. Please see Lines 193-196.

      “This number likely represents a conservative estimate, as experiments may not optimally capture auto-regulatory events that depend on native expression levels or specific physiological conditions.”

      - Line 230 - "This conservation suggests that TFs within the same cluster co-regulate similar sets of genes." - Why would clustering of TF binding site motifs need to be done to make this assessment? Couldn't the shared set of regulated genes be identified directly from the binding site data? Computing TF binding site motifs has obvious value, but I am struggling to understand the point of clustering the motifs. Is there some implied evolutionary or physiological connection here? No specific physiological roles or hypotheses are discussed in this section.

      Thank you for this important question. We agree that shared target genes can be identified directly from ChIP-seq binding data, which we also analyzed (co-association analysis). The motif clustering analysis serves a complementary and distinct purpose that provides information not directly obtainable from overlapped targets alone. Specifically, target overlap is inherently condition dependent, and motif clustering captures this intrinsic binding specificity, which reflects the structural similarity of DBDs, evolutionary relationships, and potential for functional redundancy or cooperativity under specific conditions. We have revised the related content in the manuscript, and please see Lines 236-242.

      “Clustering of TF binding motifs identified groups of TFs with similar intrinsic DNA-binding specificities. As expected, many clusters contained TFs from the same DBD families, reflecting evolutionary conservation and potential functional redundancy or competitive binding at shared regulatory elements. Notably, the clustering also uncovered associations between TFs from different DBD families, suggesting convergent evolution of binding specificity or novel regulatory interactions that warrant further investigation.”

      - Line 284 - should "metabolomic" be "metabolic"? I didn't see metabolomic data

      Yes, we have revised. Please see Line 311.

      - Several of the figures are too small (e.g. Fig S4A) or complex (Fig 2A) to see clearly or glean information from.

      Thank you for this comment. We acknowledge that Figure 2A and Figure S4A contain dense information due to the comprehensive nature of the regulatory network and the large number of TFs analyzed. We believe these overview figures serve an important purpose in conveying the scale and organization of the regulatory network, while the tables (Table S6 for Fig. S4A and Table S3 for Fig. 2A) provide the granular data needed for specific inquiries. We have also made the figures available in higher resolution and increased font sizes where possible without compromising the overall layout.

      - I don't understand the organization of the "Ternary regulatory motifs" in Supplementary Data File 4 - A table of contents explaining the tabs and columns would be welcome (for this as well as other supplementary files, some of which are more straightforward than others).

      Thank you for this suggestion. We have now revised all supplementary data files to include header and necessary annotations in the first row. Specifically for Supplementary Data File 4, the three columns (Top, Middle, Bottom) represent the left, middle, and right node, respectively, in each ternary regulatory motif.

      - I would have expected genomic locations of TF binding sites would have been one of the Supplementary Tables, to increase the accessibility of the data. However, the data is made available through their website, https://jiadhuang0417.shinyapps.io/PATF_Net/, which was easy to access and download the full dataset, so this is a minor issue.

      Thank for accessing our PA_TFNet database and for the positive feedback on data accessibility. We agree that providing genomic locations of TF binding sites is crucial. These data are fully available and downloadable through the web interface, which allows flexible searching, filtering, and batch download of binding sites. We felt that the interactive and database format provides more functionality than static supplementary tables (e.g., dynamic filtering by TF, genomic region, or binding strength), given the large scale of this dataset.

      References

      (1) Hua, C., Huang, J., Wang, T., Sun, Y., Liu, J., Huang, L. et al. Bacterial Transcription Factors Bind to Coding Regions and Regulate Internal Cryptic Promoters. Mbio 13, e0164322 (2022).

      (2) Chemani, C., Imberty, A., de Bentzmann, S., Pierre, M., Wimmerová, M., Guery, B. P. et al. Role of LecA and LecB lectins in Pseudomonas aeruginosa-induced lung injury and effect of carbohydrate ligands. Infect Immun 77, 2065-2075 (2009).

      (3) Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A. et al. RegPrecise 3.0–a resource for genome-scale exploration of transcriptional regulation in bacteria. Bmc Genomics 14, 745 (2013).

      (4) Cui, G. Y., Zhang, Y. X., Xu, X. J., Liu, Y. Y., Li, Z., Wu, M. et al. PmiR senses 2-methylisocitrate levels to regulate bacterial virulence in Pseudomonas aeruginosa. Sci Adv 8 (2022).

      (5) Hwang, W., Yong, J. H., Min, K. B., Lee, K.-M., Pascoe, B., Sheppard, S. K. et al. Genome-wide association study of signature genetic alterations among pseudomonas aeruginosa cystic fibrosis isolates. Plos Pathog 17, e1009681 (2021).

      (6) Gutu, A. D., Sgambati, N., Strasbourger, P., Brannon, M. K., Jacobs, M. A., Haugen, E. et al. Polymyxin resistance of Pseudomonas aeruginosa phoQ mutants is dependent on additional two-component regulatory systems. Antimicrob Agents Chemother 57, 2204-2215 (2013).

      (7) Zhang, L., Fritsch, M., Hammond, L., Landreville, R., Slatculescu, C., Colavita, A. et al. Identification of genes involved in Pseudomonas aeruginosa biofilm-specific resistance to antibiotics. PLoS One 8, e61625 (2013).

      (8) Galan-Vasquez, E., Luna, B. & Martinez-Antonio, A. The Regulatory Network of Pseudomonas aeruginosa. Microb Inform Exp 1, 3 (2011).

      (9) Fan, L. G., Wang, T. T., Hua, C. F., Sun, W. J., Li, X. Y., Grunwald, L. et al. A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11 (2020).

      (10) Chan, S. S.-K. & Kyba, M. What is a master regulator? Journal of stem cell research & therapy 3, 114 (2013).

      (11) Wang, T. T., Sun, W. J., Fan, L. G., Hua, C. F., Wu, N., Fan, S. R. et al. An atlas of the binding specificities of transcription factors in Pseudomonas aeruginosa directs prediction of novel regulators in virulence. Elife 10 (2021).

    1. eLife Assessment

      This useful study characterizes the evolution of medial prefrontal cortex activity during the learning of an odor-based choice task. The evidence provided is solid, providing quantification of functional classes of cells over the course of learning using the longitudinal calcium recordings in prefrontal cortex, and quantification of prefrontal sequences. However, the experimental design appears to provide limited evidence to support strong conclusions regarding the functional relevance of neural sequences. The study will be of interest to neuroscientists investigating learning and decision-making processes.

    2. Reviewer #1 (Public review):

      This study presents a useful finding about development of task representations in mouse medial prefrontal cortex using 1-photon calcium recordings in an olfactory-guided spatial memory task. A key strength of the study is the use of longitudinal recordings allowing identification of task-related activity that emerges after learning. The study also reports existence of neuronal sequences during learning and their replay at reward locations. The evidence provided is solid, providing quantification of functional classes of cells over the course of learning using the longitudinal calcium recordings in prefrontal cortex, and quantification of prefrontal sequences.

      (1) The authors continue to state that task phase selective cells (non-splitter) cells can be considered as "cross-condition generalization" and interpret them as "potential building blocks of schemas". However, cross-condition generalization requires demonstration of cross-condition generalization performance (CCGP) of neural decoders across task conditions, which is not shown here.

      (2) The authors note that correlations on short time scales are not similar between sampling and reward phase, acknowledging that these two represent different behavioral states in a cued-memory task, and that the manuscript should more clearly distinguish replay with "pure sequences". However, while the last line in the abstract states that "sub-second neural sequences in the mPFC are more likely involved in behavioral outcomes rather than planning future actions", references are made throughout the manuscript to preplay/replay sequences, including results primarily for non-cued spatial memory tasks, in which there is no cued sampling phase. For example, lines 259-263 state "During odor sampling phase, no such significant replay was observed..." and "... sequence clusters showed small but significant bias to preplay in the sampling phase". If the authors want to distinguish between replay and "pure" sequences, then the terminology "replay" and "preplay" should not be used here.

      Further, large parts of the Discussion are devoted to comparison to hippocampal ripple-associated replay. Lines 355-356 in Discussion state that "the suggestion that mPFC sequences may also support planning [Tang et al., 2021] could not be confirmed by our work as sequences in the odor sampling phase were absent". It should be clarified that this is a comparison between what the authors term "pure sequences" in the sampling phase of an odor-cued task, and internally generated sequences during hippocampal ripples in a non-cued spatial memory task, so this is not a like-for-like comparison.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      There are a few remaining issues:

      (1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

      We agree with the reviewer that task phase selective cells (“non-splitter cells”) alone do not fulfill the “relationality” criterion of schemas. We found only few of them, and so we cannot really say something about how they covary. We, however, would like to stress that our finding that task phase selective cells have stable firing field comparing learned (task) and habituation (no-task) conditions can be considered as “cross-condition generalization.” We have further specified our discussion of schemas with a particular emphasis on a potential interpretation of the generalizing task phase cells as “potential building blocks of schemas.”

      (2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

      We agree with the reviewer that sampling and reward phase represent two very different behavioral states. Nevertheless, correlations on short time scales could be similar, which we show is not the case and therefore we do not consider this result trivial. Regarding the interpretation of sequences, we apologize that we have not been sufficiently clear on distinguishing replay with pure sequences. While we find such sequences in the sampling phase (indicative of fast temporal correlation structure beyond cofiring quantified in Figure 3) they are NOT pre/replaying any task related information. Otherwise, our results are fully in line with previous literature on oscillations that we have included in the previous round of revisions. We added a similar explanation at multiple instances in the Results and Discussion section.

      Reviewer #2 (Public review):

      Comments on revisions:

      Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

      We have added control quantifications from time randomized sequences which produce a much lower amount of detected sequences. See response below.

      Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

      We have added a paragraph at the beginning of the Discussion on how our results align with the Muysers et al. 2025 paper.

      Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters.

      In response to this question, we have added the requested shuffling control (Supplement 1B to Figure 4). In the shuffled data the amount of detected recurring sequence clusters is only about half of those in the original data. The amount of bursts assigned to clusters in the shuffled data only remains 46% of the originally assigned bursts on average, clearly indicating that the detected sequences in the non-randomized data cannot be explained without assuming stable temporal order.

      Some clusters, however, are still detected in randomized data, which, however, is expected if participation of cells is heterogeneous with some highly active cells occurring in more than half of the bursts. Then random sequences spuriously occur above chance level representing the clusters of random order of few highly active cells. In line with this interpretation, we see that

      (1) Bursts that were removed after shuffling have exactly 0 high-firing cells

      (2) Clusters derived from shuffled sequence have a less sparse contribution of high firing cells, i.e., high firing cells contribute to significantly more clusters in randomized data than in nonrandomized data.

      The difference in the distribution of high firing cells further indicates that sequences obtained with and without randomization are of different quality.

      The spurious (false positive) clusters detected after randomization nevertheless may have a physiological meaning as they identify rate coactivation patterns that were also picked up by analysis in Figure 3.

      Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well.

      The Methods sections states “The [cluster merging] procedure was repeated until no pair [of clusters] satisfied the merging criterion.”

      Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account."

      We do not see what is wrong with this sentence. To avoid confusion.” we have replaced lower case l with upper case L as sequence length.

      Reviewer #3 (Public review):

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      We have added further specifications following the Reviewer’s suggestion and now mention that the threshold is permissive and “capturing large amount cofiring structure.”

    1. eLife Assessment

      The authors make an important contribution to comparative functional genomics by developing a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types. Using a single-cell RNA-seq dataset of induced pluripotent stem cells and derived embryonic bodies from four primate species: humans, orangutans, cynomolgus macaques, and rhesus macaques, the authors provide convincing evidence that cell type-specific marker genes are substantially less transferable across species than broadly expressed genes, with transferability declining as phylogenetic distance increases. This study establishes a key framework and reference dataset for comparative single-cell analyses and encourages more rigorous evaluation of marker gene transferability across species.

    2. Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise would be unobservable.

      Undirected differentation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and human and then employed to annotate other species. Jocher, Janssen et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa etc.

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them. Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses? Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types, and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. But some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Comments on revisions:

      I think the authors have addressed my previous comments to my satisfaction, and I thank them for the changes they have made, it's good to see that the manuscript is just as sound as it seemed the first time around.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. eLife Assessment

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. The evidence supporting the main claims is convincing, with multiple replications, validation of their techniques, and appropriate controls. The work will be of broad interest to neuroscientists interested in central mechanisms of motor learning and control, as well as thalamic physiology.

    2. Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate activity of cerebellar nuclei neurons projecting to two thalamic subregions that target motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during task vs after task), the authors report valuable finding of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after task impairs consolidation of the learned skill is interesting.

      Revision comment:

      The revised manuscript is improved in clarity and methodological detail. An important addition is the retrograde labeling data showing a degree of anatomical segregation between CN->CL and CN->VAL pathways that strengthens their reported different functional roles. I still think that potential effects on motor execution when cerebellar nuclei are silenced during task performance may complicate interpretations specifically related to learning. However, the evidence supporting a role of the cerebellar nuclei in off-line consolidation is convincing.

      Overall, the study outlines a multifaceted role of the cerebellum in motor learning, consolidation, and execution. The demonstration that cerebellar projections to distinct forebrain structures contribute to these processes is significant.

    3. Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that: 1) cerebellothalamic connections are important for learning motor skills, 2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning, 3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and 4) that once a skill is acquired, cerebellothalamic connections become important for online task performance. The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between on-line learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      (4) The retrograde tracing experiments (Supplementary Figure 5) demonstrate convincingly that the CN-VAL and CN-CL projections are almost entirely segregated,

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is (as they acknowledge in the Discussion) impaired rotarod performance at fixed higher speeds in Supplementary Figure 4f for CN-VAL projections, suggesting that there could be subtle changes in motor performance below the level of detection of their assays. There is also a trend in the same direction that did not pass significance for CN-CL at higher speeds, suggesting that part of the effects could be related to subtle deficits in performance.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.

    1. eLife Assessment

      In this valuable study, the authors report on an innovative chemostat propagation system to reduce eukaryotic viruses while retaining phages in mixtures used for FVTs (fecal virome transplant). The authors hypothesized that chemostat-propagated viromes could modulate the gut microbiota and reduce necrotizing enterocolitis (NEC) lesions while avoiding potential side effects, such as earlier onset of diarrhea. Although no effect on NEC could be demonstrated, the revised document addressed the other concerns and is much improved from its original version. The study is convincing in that it integrates in vitro fermentation, high-resolution metagenomics, immunogenicity assays, and in vivo validation, demonstrating the potential of FVT using eukaryotic-free virome-based therapeutics.

    2. Reviewer #1 (Public review):

      Summary:

      Fecal virome transfer (FVT) has the potential to take advantage of microbiome associated phages to treat diseases such as NEC. However, FVT is also associated with toxicity due to the presence of eukaryotic viruses in the mixture, which are difficult to filter out. The authors use a chemostat propagation system to reduce the presence of eukaryotic viruses (these become lost over time during culture). They show in pig models of NEC that chemostat propagation reduce the incidence of diarrhea induced by FVTs.

      Strengths:

      The authors report an innovative yet simple approach that has the potential to be useful for future applications. Most of the experiments are easy to follow and are performed well.

      Weaknesses:

      The biggest weakness is that the authors show that their technique addresses safety, but they are unable to demonstrate that they retain efficacy in their NEC model. This could be due to technical issues or perhaps the efficacy of FVT reported in the literature is not robust.

      During the revision, the authors have acknowledged these limitations and added clarifications where necessary.

    3. Reviewer #2 (Public review):

      The authors hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potential side effects, such as the earlier onset of diarrhea. This is interesting.

      Major revision

      (1) As authors said, the aim of the research is 'We hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potentialside effects, such as earlier onset of diarrhea'.

      (a) For the efficacy, in Fig 5, there are no significance in stomach pathology and enterocolitis between groups, even between the control group and the experimental groups, is it because of the low incidence of NEC? This may affect the statistical power of the conclusions. And how can you draw the conclusion that chemostat can reduce NEC lesions?

      (b) Lack of gross view pictures of animal tissues or any other pathological pictures is not convincing.

      (c) For the safety, such as body weight development, FVT had no statistical significance with control, CVT and CVT-MO, so how can you draw the conclusion that chemostat can avoiding potentialside effects?

      (d) The evidence to prove the decrease of eukaryotic viruses are not enough and quantitative.

      (2) Fig 3F,

      (a) How can a medium have 'the baseline viral content' ?

      (b) Statistical significance of relative abundance of specific eukaryotic viral contigs between different times is unkown.

      (c) Some of listed eukaryotic viruses, their hosts are not pigs, piglets or even human, so what's the meaning if these eukaryotic viruses decreased?

      (3) In this study, pH 6.5 was selected as the pH value for chemostat cultivation, but considering the different adaptability of different bacteria to pH, it is recommended to further explore the effect of pH on bacteria and virus groups. In particular, it was optimized to maintain the growth of beneficial bacteria such as Lactobacillaceae and Bacteroides in order to improve the effect of chemostat cultivation.

      (4) In some charts, the annotation of error lines, statistical significance markers (even 'ns' should be marked), etc., should be more standardized and clearer. And in your results section, the combination of pictures is messy, thus maybe you should do some recombination.

      Comments on revisions:

      (1) At the design level, the study posited "reduction of necrotizing enterocolitis (NEC)" as the primary hypothesis and endpoint. Yet neither of the two in-vivo experiments demonstrated any NEC-protective signal; Experiment 2 even showed a trend toward more severe gastric lesions. Although delayed onset of diarrhea can be listed as a secondary endpoint, its clinical significance is limited. The work remains a safety proof-of-concept and falls short of efficacy validation, yielding insufficient scientific value for publication.

      (2) The manuscript postulates a link between the loss of Lactobacillaceae phages and the absence of NEC protection, but no reverse verification (e.g., re-introducing these phages or optimizing culture to retain them) was performed within the study.

      (3) Culturing intestinal microbiota ex vivo is inherently challenging, owing to oxygen sensitivity, pH drift, nutrient depletion, and other factors. This study not only failed to demonstrate stable congruence between the cultured community and the original fecal inoculum, but also documented a marked loss of Lactobacillaceae and a 75 % drop in viral diversity. In the absence of any NEC-protective efficacy, the authors likewise provide no functional validation of phage viability (lysis assays, MOI determination, etc.). Consequently, the data are inadequate to support expectations of therapeutic benefit in vivo.

    4. Reviewer #3 (Public review):

      This study investigated the in vitro amplification of donor fecal virus using chemostat culturing technology, aiming to reduce eukaryotic virus load while preserving bacteriophage community diversity, thereby optimizing the safety and efficacy of FVT. The research employed a preterm pig model to evaluate the effects of chemostat-propagated viromes (CVT) in preventing necrotizing enterocolitis (NEC) and mitigating adverse effects such as diarrhea.

      Strengths:

      (1) Enhanced Safety Profile:<br /> Chemostat cultivation effectively reduced eukaryotic virus load, thereby minimizing the potential infection risks associated with virome transplantation and offering a safer virome preparation method for clinical applications.

      (2) Process Reproducibility:<br /> The chemostat system achieved stable amplification of bacteriophage communities (Bray-Curtis similarity >70%), mitigating the impact of donor fecal variability on therapeutic efficacy.

      Comments on revision:

      The authors have satisfactorily addressed all comments and concerns raised during the review process. The revised manuscript is clear, complete, and meets the standards of the journal.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fecal virome transfer (FVT) has the potential to take advantage of microbiome associated phages to treat diseases such as NEC. However, FVT is also associated with toxicity due to the presence of eukaryotic viruses in the mixture, which are difficult to filter out. The authors use a chemostat propagation system to reduce the presence of eukaryotic viruses (these become lost over time during culture). They show in pig models of NEC that chemostat propagation reduce the incidence of diarrhea induced by FVTs.

      Strengths:

      The authors report an innovative yet simple approach that has the potential to be useful for future applications. Most of the experiments are easy to follow and performed well.

      Weaknesses:

      The biggest weakness is that the authors show that their technique addresses safety, but they are unable to demonstrate that they retain efficacy in their NEC model. This could be due to technical issues or perhaps the efficacy of FVT reported in the literature is not robust. If they cannot demonstrate efficacy of the chemostat propagated virome mixture, the value of the study is compromised.

      We appreciate the reviewer’s assessment and fully acknowledge that our inability to demonstrate NEC protection by FVT is a limitation to the study. If technical issues cover the variability in disease phenotype in our animal model, which is of a spontaneous nature, then yes we fully agree. Issues with FVT preparation are however unlikely, as this is performed per protocol. The effect of FVT on NEC has hitherto only been demonstrated by our research group in two individual studies using separate donor fecal material, so it is indeed too early to speculate about robustness in FVT response. We have briefly mentioned this in the results (lines 563-565) and discussion (lines 777-779), but agree that it needs further elaboration. We have now revised the discussion and conclusion to better emphasize the extent and consequences of this limitation (lines 793-797 + lines 817-818). Importantly, we show that inclusion of specific nutrients, such as milk oligosaccharides, impacts the resulting propagated fecal-derived virome. One can argue that this is not surprising, but it has nevertheless not been shown before – and it opens up possibilities for future “tailor-made” fecal-derived viromes with predictable profiles and effects.

      Even though we do not demonstrate an effect of the chemostat-propagated virome, we still believe that the study provides valuable insights as a proof-of-concept. Specifically, we demonstrate that in vitro chemostat propagation can significantly modulate the safety profile of FVT, while still driving changes in the microbiome, e.g., by decreasing C. perfringens.

      The above issue is especially concerning because the chemostat propagation selected for bacteria that may not necessarily be the ones that harbor the beneficial phages. Without an understanding of exactly how FVT works, is it possible to make any conclusion about the usefulness of the chemostat approach?

      The chemostat work was based on the idea that if we culture a fecal inoculum under suitable conditions, then the phageome would propagate alongside and allow for a scalable production method for standardized donor-independent FVT. We are cognizant that the chemostat end-culture diverged quite markedly from the fecal inoculum. In reality, such divergence is unavoidable when performing in vitro simulation of intestinal growth conditions. On the positive side, we showed that we could drive an expansion of Bacteroides spp. by supplementing the media with human milk oligosaccharides. We have previously shown that Bacteroides spp. engraft FMT recipients that are in turn protected from NEC. However, there is much room for refinement of the chemostat culture condition; i.e. to preserve the rich repertoire of lactobacilli from the inoculum e.g. by means of lowering the pH. Moreover, the loss of viral diversity in the chemostat end-culture also needs to be addressed, potentially by lowering the chemostat dilution-rate to allow the time for phage propagation. Based on these insights, we will in the near future invest heavily in improving the chemostat procedure to end up with a propagated fecal virome with better resemblance to the fecal inoculum.

      Finally, can the authors rule out that their observations in THP-1 cells are driven by LPS or some other bacterial product in the media?

      We thank the reviewer for raising this point. To minimize the influence of bacterial contaminants such as LPS or other small bacterial products, we implemented several steps during sample preparation. Specifically, we performed ultrafiltration using a 300 kDa molecular weight cut-off, which should remove small molecules, including LPS, bacterial metabolites, and other potential soluble immunomodulators. Hereafter, all viral preparations underwent endotoxin removal procedures prior to cell exposure. These precautions reduce the likelihood that our observed effects in THP-1 cells are attributable to bacterial products rather than viral components. This is explained in the referenced article (20), but we have now added the clarification to the Methods section of the revised manuscript (lines 222 and 227). The immune expression profile differs markedly between the viral preparations and the E. coli control, e.g. IFNG, TLR3, TLR8, making it highly likely that viral epitopes are the major drivers of the viral preparations with less impact by any potential bacterial epitope contaminant. This is now mentioned in the results section (line 541-543):

      Reviewer #2 (Public review):

      Major revision

      (1) As authors state that the aim of the research is 'We hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potential side effects, such as earlier onset of diarrhea'.

      (a) For the efficacy, in Fig 5, there are no significance in stomach pathology and enterocolitis between groups, even between control group and experimental groups, is it because of the low incidence of NEC? This may affect the statistical power of the conclusions. Therefore, it is unclear how one can draw the conclusion that chemostat can reduce NEC lesions?

      Thank you for highlighting this important point. We fully agree and would like to clarify that it is not our intention to conclude that chemostat propagation reduces NEC lesions under the experimental settings within this paper. Rather, this was our initial hypothesis, which could not be confirmed. The unexpectedly low incidence of NEC across groups in Piglet Experiment 1 did not allow for a clear conclusion, but the second Piglet Experiment 2 failed to show a NEC-reducing effect. We have stated this important point in the following sections:

      - Abstract (line 42-44): “However, these signatures were lost in recipients of chemostat-propagated viromes, and only minor microbiome effects and no NEC prevention were observed.”

      - Results (line 699): “This highlights that while chemostat propagation effectively mitigates virus-associated diarrhea, the method needs further optimization to targt NEC.”

      - Discussion (lines 773–775): “However, the MO-propagated chemostat virome did not increase Bacteroides or Parabacteroides spp. in the recipient’s gut, nor did it provide NEC protection.”

      - We have rephrased this to emphasize the importance of Experiment 2.

      - To avoid any potential misinterpretation, we have rephrased line 598 to reflect that we observed “a difference in the clinical side effect pattern” rather than implying efficacy.

      - Furthermore, we have updated the summary title for Figure 8 (line 704) to clearly state: “MO-propagated virome modestly exacerbates gastric injury and fails to improve NEC.”

      - Also, we have added the following section to the discussion (lines 793-797): “However, we acknowledge that the absence of demonstrated NEC prevention by the native donor virome is a significant limitation to conclusions regarding efficacy. Without a protective baseline, we cannot assess whether the virome efficacy was lost during chemostat propagation. Consequently, we cannot confirm or dismiss the hypothesis that chemostats can preserve a phage community capable of preventing NEC.”

      - Lastly, we have updated the conclusion (lines 817-818): “However, as neither the chemostat-propagated viromes nor the native donor virome demonstrated NEC prevention, the efficacy of the chemostat approach remains inconclusive.”

      - These changes should clarify that while the study demonstrates improved safety via reduced diarrhea, NEC efficacy was not obtained.

      (b) More convincing pathology images would be helpful.

      Since we did not observe a protective effect against NEC with either of the treatments, we opted not to include pathology images. However, extensive examples can be found in the cited paper (reference 37), which describes our NEC scoring methodology in the Methods section (lines 268-271): https://doi.org/10.1016/j.yexmp.2024.104936.

      (c) For the safety, such as body weight development, FVT had no statistical significance difference from control, CVT, and CVT-MO, so how can you drawn the conclusion that chemostat can avoid potential side effects?

      We appreciate the reviewer’s observation. To clarify, we do not claim that chemostat propagation completely avoids all potential side effects, but rather that it mitigates them. As shown in Fig. 5G, FVT recipients exhibited significantly reduced body weight gain compared to controls, CVT, and CVT-MO specifically on day 4, but not on day 5. This transient effect suggests that side effects such as reduced growth and early-onset diarrhea are delayed, not entirely prevented, by chemostat propagation. This is stated in the results section in lines 593-595. We also believe that this is consistent with the paper title and the conclusion that the chemostat process minimizes the adverse effects associated with native FVT (line 813).

      (d) There is lack of evidence to convince the reader that there is a decrease of eukaryotic viruses. More quantitative data here would be useful.

      Apart from the fact that it is impossible for eukaryotic viruses to shed in a system devoid of eukaryotic cells, and that the chemostat runs continuously exchanges the culture, thereby diluting any substance incapable of propagation, we agree that quantitative data to demonstrate a reduction of eukaryotic virus load is lacking.

      However, in this case we believe the relative viral abundance data are almost as convincing. To make this even clearer, we have produced new graphs showing 1) the eukaryotic viral abundance relative to total viral abundance and 2) observed eukaryotic viral species, both after medium subtraction. Eukaryotic viral relative abundances decrease from around 0.4% to approach zero already in the batch phase, and similarly number of eukaryotic viral species decrease from around 10 in the fecal inoculum to zero midway through the chemostat phase. These new graphs are now part of Supplementary figure S3 B-C. Moreover, an error in the eukaryotic viral heatmaps presented in Figure 3F now means that the relative abundance of each sample (column) now sums up to 100%. Please also notice from the lower heatmap (where the virome signature of the medium is subtracted) that no eukaryotic viruses are identified from the sequencing data of the samples from the chemostat from 50 hours and onwards.

      However, for future experiments we will consider adding a known quantity of a marker virus to the inoculum and monitoring its concentration (e.g., by qPCR) throughout the culture process. Importantly, if the resulting virome is meant for in vivo testing, this marker virus should be inert to the receiving organism.

      (2) Questions regarding Fig 3F,

      (a) How can the medium have 'the baseline viral content' ?

      As we have previously seen persistent eukaryotic viral signals in metagenomics sequencing data from chemostat experiments, we sampled and sequenced the culture medium. As is seen from Figure 3F, this only concerns Dicistroviridae, as the patterns of the remaining eukaryotic viral signals before and after medium subtraction are virtually similar. For some reason, a component of the culture medium contains a genetic signal from this entity. Since all culture components are sterilized, it is most likely genomic traces that are then continuously supplied with the medium and appears in all culture samples. As it is unlikely to derive from intact viruses, the in vivo implications are deemed minimal.

      (b) What is the statistical significance of relative abundance of specific eukaryotic viruses?

      The same as any statistical comparison on single OTU level in a nucleotide sequencing dataset. As commented above, it does not prove a quantitative depletion of eukaryotic virus throughout the chemostat process but given the context a reduction in relative abundance supports the notion that eukaryotic viruses are indeed depleted when the culture medium is exchanged. The relevant question to us is: What is the magnitude of depletion? Which is particularly relevant since the clinical data indicates a delay and not a prevention of side effects after transplantation. Hence, as proposed above, the use of a marker virus would provide us with that answer.

      (c) The hosts for some of the listed eukaryotic viruses are neither pigs or human, as such the significance of a decrease in these viruses to humans is unclear.

      Dicistroviridae is not present in the inoculum and shows up only when medium is added. Picobirnavirus and Astrovirus are relevant mammalian intestinal viruses, whereas Smacoviridae is less well described (dois: 10.3389/fvets.2020.615293 and 10.3390/v8020042). Genomoviridae as a fungal virus indeed appears to be less relevant in the case of the mammalian intestine. Indeed, at any given time point in any given individual, be it a pig or a human, it would carry with it several viral species that are incapable of infecting it, most likely transiting after being ingested with food, or in the case of pigs through rummaging. It is no secret that we have been searching for a causative agent responsible for the clinical side effect patterns related with FVT, but there seems to be no consistent viral agent that is overabundant in diarrheal piglets. Hence, in this study, we are mostly interested in the proof-of-concept for overall eukaryotic virus reduction through chemostat propagation, and we believe we have presented data in support of this.

      (3) In this study, pH 6.5 was selected as the pH value for chemostat cultivation, but considering the different adaptability of different bacteria to pH, it is recommended to further explore the effect of pH on bacteria and virus groups. In particular, it was optimized to maintain the growth of beneficial bacteria such as Lactobacillaceae and Bacteroides in order to improve the effect of chemostat cultivation.

      We agree that pH is a key parameter in shaping microbial communities during chemostat cultivation. As noted, we selected pH 6.5 to balance physiological relevance and bacterial viability, but we acknowledge that this pH may not be optimal for supporting the growth of certain potentially beneficial taxa such as Lactobacillaceae. We explicitly address this in the discussion (lines 736–741), where we state that the selected pH may have limited engraftment and that future studies should investigate pH optimization to better support bacterial groups and improve the overall effectiveness of the cultivation system.

      (4) Please improve the quality of the images, charts, error bars and statistical significance markers throughout and mark the n's. used in each experiment.

      We have carefully reviewed all figures and could not identify any general image quality issues. If some specific images or panels appear unclear or problematic, we would appreciate it if the reviewer could point them out so we can address them directly.

      Regarding sample sizes, the number of animals (n) is indicated in Fig. 5A and its legend, as well as in Fig. 8A. We have now also added this information to the legend of Fig. 8 for clarity.

      To improve the clarity of statistical findings, we have added asterisks to denote significance in panels 6A, 6F, and 7A, as requested.

      To improve the clarity of Fig. 3B, we have added a dashed line to separate LAC and LAC-MO.

      Reviewer #3 (Public review):

      Major revisions

      This study investigated the in vitro amplification of donor fecal virus using chemostat culturing technology, aiming to reduce eukaryotic virus load while preserving bacteriophage community diversity, thereby optimizing the safety and efficacy of FVT. The research employed a preterm pig model to evaluate the effects of chemostat-propagated viromes (CVT) in preventing necrotizing enterocolitis (NEC) and mitigating adverse effects such as diarrhea.

      Strengths:

      Enhanced Safety Profile: Chemostat cultivation effectively reduced eukaryotic virus load, thereby minimizing the potential infection risks associated with virome transplantation and offering a safer virome preparation method for clinical applications.

      Process Reproducibility: The chemostat system achieved stable amplification of bacteriophage communities (Bray-Curtis similarity >70%), mitigating the impact of donor fecal variability on therapeutic efficacy.

      Weaknesses:

      Loss of Phage Functionality: The chemostat cultivation resulted in a reduction in phage diversity (e.g., the loss of Lactobacillaceae phages), which may compromise their protective effects against NEC (potentially linked to the immunomodulatory functions of Lactobacilli). The authors should explicitly address this limitation in the discussion section, particularly if additional experiments cannot be conducted to resolve it within the current study.

      We appreciate the reviewer’s concern and agree that the loss of phage diversity during chemostat cultivation, especially phages targeting Lactobacillaceae, is an important limitation with potential implications for NEC protection.

      We already described the depletion of Lactobacillaceae in the chemostat and its implications in the discussion (lines 742-751 + 787-793), along with our plans to address this in future work by adjusting culture pH. However, we acknowledge that the significance of losing phage diversity deserves more explicit attention. Accordingly, we have expanded the discussion to highlight the possible consequences of this loss and its impact on phage functionality (see lines 758–762), as suggested by the reviewer.

      Limitations in Experimental Design: The low incidence of NEC lesions in the control group reduced the statistical power of the study. This limitation undermines the ability to conclusively evaluate the efficacy and safety of the chemostat-propagated virome as a novel intervention for NEC. Future studies should optimize experimental conditions (e.g., using a more NEC-susceptible model or diet) to ensure adequate disease incidence for robust statistical comparisons.

      We agree that the low NEC incidence in Experiment 1 limited the statistical power to evaluate efficacy. To address this, we designed Experiment 2 using a more NEC-inducing diet (formula 2), which resulted in a higher level of baseline lesions. This allowed for a more conclusive assessment, demonstrating that the MO-propagated chemostat virome did not provide NEC protection when using the donor feces and culture conditions applied in this experiment.

      We acknowledge that this was too unclear in the original manuscript. Please see the response to the first comment by Reviewer 2, where we have highlighted several revisions to improve clarity.

      However, we do believe the data are robust enough to conclude that the level of diarrhea — and thereby safety — was improved in the piglet model, which is why we chose to focus on this aspect in the paper’s title.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents a well-structured study investigating the feasibility of using chemostat-based culturing of the fecal virome to reduce the transfer of eukaryotic viruses during fecal virome transfer (FVT). Utilizing both in vitro fermentation systems and a preterm piglet model, the authors explore whether this method could be a safer and equally effective alternative to raw FVT for treating neonatal intestinal diseases, such as necrotizing enterocolitis (NEC). This study introduces a novel mitigation strategy for FVT through chemostat fermentation. However, a significant revision is recommended before the manuscript can be considered for publication.

      Major Changes:

      - A central aim of the study was to assess whether chemostat-cultured viromes maintain protective effects against NEC. However, this key outcome remains "unresolved" due to the low incidence of NEC in the control group. The discussion should address this limitation.

      We fully acknowledge this limitation and agree that our study cannot conclude whether the NEC effect of FVT was maintained without demonstrating an effect of this native virome. Please see our response to a similar concern raised by Reviewer 1, where we describe the revisions made to the discussion (lines 793-797) and conclusion (lines 817-818).

      - The section on viral particle enrichment should be expanded and discussed in more detail. It would be beneficial to examine its efficiency in separating bacteria from viral-like particles (VLPs) compared to findings from previously reported studies. The authors should clarify the rationale behind the selected dose of VLPs used in the experiments and their role in virus engraftment results.

      We selected the virome isolation method based on previous experiments within our lab, demonstrating efficient separation of bacteria and virus particles, using a 0.45 um filter syringe. Filtrates were quality assessed by fluorescence microscopy, showing absence of intact bacteria. Using a diverse mock virus community, we also showed a high degree of preservation of infective viruses in the FVT following the isolation procedures. We have now expanded the description of the separation method in the results section with a reference to this work (lines 188-190). We did however choose to increase the molecular weight cut off (MWCO) to enhance the exclusion of non-viral components.

      We acknowledge that the rationale and importance of the VLP dose was lacking in the discussion. This has now been added (line 758-762).

      - The viral richness of chemostat viromes was significantly lower than that of native feces. The authors should discuss how this may impact microbiome and virome outcomes.

      We have included this point in the new section about VLP dose in the discussion. Please see lines 758-762.

      - The immune response was assessed through THP-1 cells and a limited piglet cytokine panel. These may not fully represent the intestinal epithelial or mucosal immune responses. Thus, authors should acknowledge these limitations in the discussion section.

      Thank you for the comment. The limitation of using THP-1 cells as an in vitro model is already acknowledged in the results section (line 545): “Since fecal-derived eukaryotic viruses mainly infect intestinal cells, an

      in vivo stimulation may reveal a different response pattern. ”

      The limited panel of porcine cytokines was not intended as a comprehensive assessment of the mucosal immune response, but rather as supportive data for NEC-associated inflammation, as we have previously demonstrated (reference 37: https://doi.org/10.1016/j.yexmp.2024.104936). To obtain a comprehensive view of the immune response, a few days after diarrhoea onset, we additionally performed RNA-Seq analyses of the intestinal lymph node.

      - While the manuscript is comprehensive, it is also lengthy and text-heavy. Some sections could be condensed for clarity.

      The manuscript has been through multiple revisions by authors. While it is indeed lengthy, we have removed non-essential information and redundancies and now feel that the balance between data, text, figures, and supplementary information is acceptable.

      - Several figures (e.g., Figs. 1-5) contain significant data but need clearer summaries in their captions.

      We appreciate the suggestion and have revised the captions for Figs. 1-8 to provide clearer, more informative summaries of the data they present.

    1. eLife Assessment

      This important study combines microfluidic experiments with mathematical modeling to elucidate the reciprocal interplay between flow dynamics and biofilm growth and detachment. Using Pseudomonas aeruginosa as a model organism, the authors identify several key regimes and stages of biofilm development. Overall, the comparison between experimental observations of biofilm behavior under varying flow conditions and corresponding theoretical predictions forms a compelling understanding of the processes involved in biofilm dynamics. The results will be of interest to researchers studying biofilms and their technological and biological applications.

    2. Reviewer #1 (Public review):

      Summary:

      The paper investigates the interplay between fluid flow and biofilm development using Pseudomonas aeruginosa PAO1 in microfluidic channels. By combining experimental observations with mathematical modeling, the study identifies the significant impact of nutrient limitation and hydrodynamic forces on biofilm growth and detachment. The authors demonstrate that nutrient limitation drives the longitudinal distribution of biomass, while flow-induced detachment influences the maximum clogging and temporal dynamics. The study highlights that pressure buildup plays a critical role in biofilm detachment, leading to cyclic episodes of sloughing and regrowth. A stochastic model is used to describe the detachment process, capturing the apparent randomness of sloughing events. The findings offer insights into biofilm behavior during clogging and fouling, potentially relevant to infections, environmental processes, and engineering applications.

      Strengths:

      This paper demonstrates a strong integration of experimental work and mathematical modeling, providing a comprehensive understanding of biofilm dynamics in a straight microfluidic channel. The simplicity of the microchannel geometry allows for accurate modeling, and the findings have the potential to be applied to more complex geometries. The detailed analysis of nutrient limitation and its impact on biofilm growth offers valuable insights into the conditions that drive biofilm formation. The model effectively describes biofilm development across different stages, capturing both initial growth and cyclic detachment processes. While cyclic pressure buildup has been studied previously, the incorporation of a stochastic model to describe detachment events is a novel and significant contribution, capturing the complexity and randomness of biofilm behavior. Finally, the investigation of pressure buildup and its role in cyclic detachment and regrowth enhances our understanding of the mechanical forces at play, making the findings applicable to a wide range of technological and clinical contexts.

      Weaknesses:

      The study achieves its primary objective of combining experiments and modeling to elucidate the coupling between flow, biofilm growth, and detachment in a confined microfluidic channel. In the revised manuscript, the authors have clarified several methodological choices and underlying assumptions. The points below are best viewed not as weaknesses, but as aspects that define the scope of the approach.

      • Biofilm porosity and permeability. The authors now discuss biofilm porosity and provide a clear rationale for neglecting permeability effects in their system, arguing that flow around dense biofilm structures dominates over flow through the matrix. While this assumption appears reasonable for the conditions explored, permeability effects are not explicitly modeled and could become relevant in less compact or more heterogeneous biofilms.

      • Characterization of the EPS matrix. The role of the extracellular matrix is convincingly addressed using polysaccharide‑deficient mutants, which provides a strong and causal link between EPS composition and mechanical stability. At the same time, the absence of complementary biochemical or imaging‑based characterization means that spatial or temporal variations in EPS distribution are not directly resolved, limiting the level of structural details.

      • Three‑dimensional interpretation of biofilm development. The authors clarify that three‑dimensional information is primarily obtained from pressure‑based measurements, with two‑dimensional imaging serving as a validation tool. This approach is coherent and supported by scaling arguments and reproducibility across experiments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewer for the thorough and constructive evaluation of our manuscript. We greatly appreciate the recognition of our work's strengths, particularly the integration of experiments and mathematical modeling, the stochastic framework for describing sloughing events, and the insights into pressure-driven detachment dynamics.

      We have carefully considered each point raised and provide detailed responses below. In response to the reviewer's comments, we have revised the Methods section to better clarify our approach to three-dimensional assessment. We believe these revisions have improved the clarity of the manuscript.

      Below, we address each of the specific concerns raised by the reviewer:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:<br /> The study achieves its primary goal of integrating experiments and modeling to understand the coupling between flow and biofilm growth and detachment in a microfluidic channel, but it should have highlighted the weaknesses of the methods. I list the ones that, in my opinion, are the main ones:

      The study does not consider biofilm porosity, which could significantly affect the flow and forces exerted on the biofilm. Porosity could impact the boundary conditions, such as the no-slip condition, which should be validated experimentally.

      Porosity is indeed a key component of biofilm structures, resulting from the polymeric nature of the EPS matrix, mechanical forces, and biological processes such as cell death or predation. When considering flow-biofilm interactions, this porosity may allow fluid flow through the biofilm, with reported permeability values spanning an extremely broad range from 1015 to 10-7 m2 (Kurz et al., 2023).

      However, we argue that biofilm permeability is not the primary driver in our system:

      (1) In microscopy visualization, our biofilms form dense structures where flow around the biofilm through narrow channels dominates over flow through the porous biofilm matrix.

      (2) We performed microrheology experiments in these biofilms by imaging the Brownian motion of nanoparticles in the biofilm. Their trajectories indicate that, in our conditions, the viscoelastic flow of the biofilm itself largely dominates over the flow of culture medium through the biofilm matrix.

      (3) We argue that the extreme variability in reported permeability values (spanning several orders of magnitude, Kurz et al., 2023) reflects not only differences in experimental systems, but also fundamental challenges in defining and measuring permeability for viscoelastoplastic biofilms (the biofilm itself is actually flowing). Given this uncertainty, incorporating permeability into our model would introduce parameters that cannot be reliably constrained from literature or independently measured in our setup. Our approach (i.e. treating the biofilm as impermeable and focusing on flow obstruction) avoids this parametrization complexity while successfully capturing the observed dynamics.

      (4) Our model successfully predicts the observed scaling laws (φmax ∝ Q1/2, Fig. 7f) and hydraulic resistance dynamics (Fig. 3) without invoking permeability, suggesting that flow obstruction rather than flow penetration is the dominant mechanism.

      Reference: Kurz, D. L.; Secchi, E.; Stocker, R.; Jimenez-Martinez, J. Morphogenesis of biofilms in porous media and control on hydrodynamics. Environ. Sci. Technol. 2023, 57 (14), 5666−5677.

      The research suggests EPS development as a stage in biofilm growth but does not probe it using lectin staining. This makes it impossible to accurately assess the role of EPS in biofilm development and detachment processes.

      We respectfully disagree that lectin staining is necessary to assess the role of EPS in our system, and we argue that our approach using genetic mutants is superior for the following reasons. Lectin staining has significant limitations. While widely used, lectin staining (e.g., concanavalin A) is non-specific (binding not only to EPS polysaccharides but also to bacterial cell surfaces) and is non-quantitative. It can confirm the presence of polysaccharides but cannot establish causal relationships between specific EPS components and mechanical properties or detachment dynamics. We performed preliminary experiments with ConA-rhodamine (data not shown), which showed widespread presence of polysaccharides. However, this provided limited insight beyond confirming EPS production, which is well-established for P. aeruginosa PAO1 biofilms. We employed a more rigorous genetic approach to directly assess the role of EPS composition. We used Δpel and Δpsl mutants (strains lacking key exopolysaccharides that are the primary structural components of the PAO1 matrix). Our results demonstrate that both mutants show significantly reduced maximum clogging compared to wild-type. The Δpsl mutant is particularly affected, with near-complete detachment at certain flow rates. These differences directly link EPS composition to mechanical stability and detachment dynamics. This genetic approach provides causal, quantitative evidence for the role of specific EPS components in biofilm development and detachment, information that lectin staining cannot provide. We believe this addresses the reviewer's concern more rigorously than lectin staining would.

      While the force and flow are three-dimensional, the images are taken in two dimensions. The paper does not clearly explain how the 2D images are extrapolated to make 3D assessments, which could lead to inaccuracies.

      We thank the reviewer for this important observation. We would like to clarify our methodological approach. Our primary three-dimensional measurement is the hydraulic resistance R(t), obtained from pressure drop measurements across the biofilm-containing channel section. This pressure-based measurement inherently captures the three-dimensional flow obstruction caused by the biofilm. We then employ a geometric model (uniform biofilm layer on all channel walls) to convert R(t) into volume fraction φ(t).

      The two-dimensional fluorescence imaging serves to validate this model-based approach rather than being the basis for three-dimensional extrapolation. The uniform layer assumption is supported by three independent lines of evidence: (i) the excellent quantitative agreement between predicted and measured scaling laws (φmax ∝ Q1/2, Fig. 7f), obtained without adjustable parameters; (ii) the high reproducibility of φmax values across different flow rates and replicates; and (iii) the strong correlation between model-derived φ(t) from pressure measurements and integrated fluorescence intensity (Fig. 3b-d).

      We have added clarifying text in the Methods section (subsection "Data analysis for the calculation of the hydraulic resistance and volume fraction") to better explain this approach and emphasize that pressure measurements provide the three-dimensional information, with the geometric model serving as the link to volume fraction.

      Although the findings are tested using polysaccharide-deficient mutants, the results could have been analyzed in greater detail. A more thorough analysis would help to better understand the role of matrix composition on the stochastic model of detachment.

      We thank the reviewer for this suggestion. Our mutant analysis demonstrates that Δpsl and Δpel strains have significantly reduced φmax and altered detachment dynamics compared to wild-type (Fig. 8), directly linking EPS composition to mechanical stability as predicted by our model. A rigorous quantitative connection between matrix composition and the stochastic parameters (interevent times, jump amplitudes) would require: (i) substantially more sloughing events for statistical power, (ii) independent mechanical characterization of each mutant, and (iii) a mechanistic model linking EPS composition to detachment parameters. We are currently developing microrheology approaches to characterize mutant mechanical properties, which could enable such refinement in future work.

      However, this represents a substantial study beyond the scope of the current manuscript, which establishes the self-sustained sloughing-regrowth cycle and its stochastic nature. The mutant results serve their intended purpose: demonstrating that EPS composition affects detachment, consistent with our model's framework.

      Reviewer #2 (Public review):

      This manuscript develops well-controlled microfluidic experiments and mathematical modelling to resolve how the temporal development of P. aeruginosa biofilms is shaped by ambient flow. The experiment considers a simple rectangular channel on which a constant flow rate is applied and UV LEDs are used to confine the biofilm to a relatively small length of device. While there is often considerable geometrical complexity in confined environments and feedback between biofilm/flow (e.g. in porous media), these simplified conditions are much more amenable to analysis. A non-dimensional mathematical model that considers nutrient transport, biofilm growth and detachment is developed and used to interpret experimental data. Regimes with both gradual detachment and catastrophic sloughing are considered. The concentration of nutrients in the media is altered to resolve the effect of nutrient limitation. In addition, the role of a couple of major polysaccharide EPS components are explored with mutants, which leads results in line with previous studies.

      There has been a vast amount of experimental and modelling work done on biofilms, but relatively rarely are the two linked together so tightly as in this paper. Predictions on influence of the non-dimensional Damkohler number on the longitudinal distribution of biofilm and functional dependence of flow on the maximum amount of biofilm (𝜙max) are demonstrated. The study reconfirms a number of previous works that showed the gradual detachment rate of biofilms scales with the square root of the shear stress. More challenging are the rapid biofilm detachment events where a large amount of biofilm is detached at once. These events occur are identified experimentally using an automated analysis pipeline and are fitted with probability distributions. The time between detachment events was fitted with a Gamma distribution and the amplitude of the detachment events was fitted with a log-normal distribution, however, it is not clear how good these fits are. Experimental data was then used as an input for a stochastic differential equation, but the output of this model is compared only qualitatively to that of the experiments. Overall, this paper does an admirable job of developing a well-constrained experiments and a tightly integrated mathematical framework through which to interpret them. However, the new insights this provides the underlying physical/biological mechanisms are relatively limited.

      We thank the reviewer for the thorough evaluation of our work and for highlighting the tight integration between experiments and modeling. We appreciate the constructive feedback regarding the goodness-of-fit for the probability distributions.

      To address the concern that "it is not clear how good these fits are," we have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes.

      Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      We respectfully disagree that “new insights this provides the underlying physical/biological mechanisms are relatively limited.” Beyond confirming previous findings (e.g., scaling for gradual detachment), we believe our work provides several novel mechanistic insights. First, the Pe/Da criterion enables quantitative prediction of nutrient limitation regimes, allowing systematic decoupling of nutrient effects from other phenomena in biofilm studies. Second, we demonstrate that pressure, not shear, drives sloughing detachment events, a mechanism overlooked in previous studies where the notion of “shear-induced detachment” clearly dominates. Third, we show that sloughing-regrowth cycles occur even in single channels, establishing pressure-driven fluctuations as a signature of confined biofilm growth, independent of geometric complexity. Finally, the stochastic description of sloughing demonstrates that, while instantaneous biofilm states are irreproducible, the underlying randomness is predictable, therefore addressing a fundamental challenge in biofilm research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the abstract, I suggest clarifying the term "bacteria development." It is unclear if it refers to bacterial growth, biofilm formation, or biofilm detachment. The concept is expressed more clearly at the end of the Introduction.

      We have modified the entire abstract to make it clearer. The abstract now explicitly establishes the key processes - growth ('nutrients necessary for growth', 'growing bacteria obstruct flow paths') and detachment ('mechanical stresses that cause detachment', 'flow-induced detachment', 'sloughing') - before using 'bacterial development' as a collective term to refer to these coupled spatiotemporal dynamics. We believe the abstract is now clear as written.

      (2) Findings from Sanfilippo et al. (2019) were slightly questioned by Padron et al. (PNAS, 2023), who discovered that H2O2 transport is responsible for fro operon upregulation.

      Thanks for the clarification, which is indeed significant. The new sentence now reads: Pseudomonas aeruginosa has been found to regulate the fro operon in response to flow-modulated H2O2 concentrations (Sanfilippo et al. 2019, Padron et al. 2023).

      (3) Additionally, Kurz et al. (2022) account for pressure buildup as the mechanism controlling sloughing.

      We respectfully disagree and note that Kurz et al. (2022) identify shear stress, not pressure buildup, as the primary mechanism controlling sloughing. Besides the title, key sentences include “opening was driven by a physical process and specifically by the shear forces associated with flow through the biofilm”, “The opening of the PFPs is driven by flow-induced shear stress, which increases as a PFP becomes narrower due to microbial growth, causing biofilm compression and rupture.” While pressure differences are measured as indicators of system state and do contribute to normal compression stresses, their mechanistic explanation emphasizes that narrowing PFPs experience increased shear rates that eventually exceed the biofilm's yield stress, triggering viscoplastic deformation and detachment. The pressure buildup is a hydraulic consequence of narrowing rather than the direct cause of sloughing. In contrast, our work demonstrates that in confined geometries, pressure differences generate tangential stresses at the biofilm-solid interface that directly drive detachment.

      (4) The flow control strategy represented in Fig. 1 is not explained and should be detailed in the Methods section.

      The methods section reads as follows. Inoculation and flow experiments BHI suspensions were adjusted at optical density at OD640nm= 0.2 (108 CFU/mL) and inoculated inside the microchannels from the outlet, up to approximately ¾ of the channel length in order to keep a clean inlet. The system was let at room temperature (25°C) for 3h under static conditions. Flow experiments were then performed at 0.02, 0.2, 2, 20 and 200 μL/min constant flow rates for 72h in the microchannels at room temperature. For the experiments at 0.2, 2, 20 and 200 μL/min, the fluidic system was based on a sterile culture medium reservoir pressurized by a pressure controller (Fluigent FlowEZ) and connected with a flow rate controller (Fluigent Flow unit). The flow rate was maintained constant by using a controller with a feedback loop adjusting the pressure in the liquid reservoir. The reservoir was connected to the chip using Tygon tubing (Saint Gobain Life Sciences Tygon™ ND 100-80) of 0.52 mm internal diameter and 1.52 mm external diameter, along with PEEK tubing (Cytiva Akta pure) with 0.25 mm inner diameter adapters for flow rate controller. The waste container was also pressurized by another independent pressure controller to reduce air bubble formation in the inlet part. For the experiments at 0.02 μL/min, we used an Harvard Phd2000 syringe pump for the flow.

      (5) Including images of the actual biofilms formed in a portion of the channel would aid in understanding the analysis presented in Fig. 2.

      Images are introduced later on (eg Figure 5). There is also supplementary material showing videos.

      (6) The boundary conditions used to calculate the stress in the developed model should be discussed. The authors should specify why biofilm porosity is neglected.

      We have added a detailed discussion in the supplementary (Section I.2).

      (7) In the first section of the Results, the authors hypothesize that heterogeneity in biofilm development could be due to oxygen limitation. However, given the high oxygen permeability of PDMS, this hypothesis is later denied by their data. It would be prudent to avoid this hypothesis initially to streamline the presentation. Additionally, the authors should specify how oxygen levels at the inlet and outlet are measured.

      We appreciate this comment and agree that streamlining would simplify the presentation. However, after careful consideration, we have chosen to retain the oxygen limitation hypothesis for the following reasons: (1) oxygen limitation is a frequently invoked mechanism in biofilm systems and deserves explicit consideration, (2) it is not immediately obvious that oxygen remains non-limiting in larger microchannels where transverse gradients could develop, and (3) systematically eliminating this plausible alternative hypothesis strengthens our mechanistic conclusion that BHI drives the observed heterogeneity. Regarding oxygen measurements: we did not directly measure dissolved oxygen concentrations. Our approach is only indirect.

      (8) What is the standard deviation of the doubling time measured at different flows (page 9)?

      We have indicated the standard deviation in the text. Note that the graph shows the SEM.

      (9) What is the "zone of interest" in the channel mentioned on page 9?

      We have added the following sentence to clarify: To further understand this effect, let us consider the mass balance of biofilm in the zone of interest -- the zone where biofilm grows in between the two UVC irradiation zones -- in the channel.

      (10) Minor and major detachment events should be classified based on a defined threshold or criteria, and their frequency should be measured.

      We appreciate the reviewer's concern about quantitative rigor. However, we respectfully disagree that imposing arbitrary thresholds to classify 'minor' vs. 'major' events would improve our analysis. Detachment events in our system span a continuum of magnitudes, and any threshold would be artificial and potentially misleading. Our quantitative characterization of detachment dynamics is provided through the statistical analysis of interevent times, which we show follow a gamma distribution. This stochastic framework captures the full spectrum of detachment behavior without requiring arbitrary binning. The terms 'minor' and 'major' in our manuscript are used qualitatively to illustrate the range of observed phenomena, not as formal classifications.

      (11) Have the authors identified a reason for the peaks in the volume fraction in the Δpsl mutants at the highest flow rate?

      The biofilm thickness following these sloughing events is below our detection limit, consistent with a residual layer of cells. However, these cells grow, leading to a time window where the fraction is measurable, before a new detachment event occurs. Our understanding is that the psl mutant forms a weaker matrix with a much lower threshold for sloughing.

      (12) The fit of the probability density function for the relative density function does not match the data well. The authors should comment on this.

      We have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes. Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      (13) Additionally, the simulated fraction appears very flat, with limited detachments compared to experiments. Why?

      The model captures the essential dynamics of growth-detachment cycles, including the characteristic timescales and volume fraction ranges. Some event-to-event variability in the experimental data likely reflects biological stochasticity not captured by our current approach—for example, variations in local biofilm mechanical properties or matrix composition that affect the precise stress at which sloughing occurs. While incorporating such biological variability as a stochastic parameter would improve detailed agreement, it would require extensive additional characterization beyond the scope of this study. The current model successfully reproduces the key qualitative and semi-quantitative features of the system.

      (14) The methods section should include a more detailed explanation of how the model was validated against experimental data.

      Model validation was performed by comparing predicted biofilm volume fraction time series and sloughing event statistics against experimental observations across multiple flow rates. The model reproduces the characteristic growth-sloughing cycles, timescales, and steady-state volume fractions without additional parameter fitting beyond the experimentally measured distributions.

      (15) It would be useful to include information on the reproducibility of the experiments and any variations observed between replicates.

      Experiments were performed in N=3 biological replicates. Individual time series for all replicates are shown in Supplementary Figures, demonstrating consistent behavior across replicates.

      (16) A discussion of the limitations of the study, particularly regarding the assumptions made in the modeling and their potential impact on the results, would strengthen the paper.

      We have added a discussion on why we chose to neglect the porosity of the biofilm, and strengthened parts on the uniform biofilm layer assumption.

      Reviewer #2 (Recommendations For The Authors):

      Page 2: "A vast" —> "The vast"

      Changed.

      The text and line widths on many of the figures are far too small. I printed it out at normal size, but had to look at a PDF and magnify to actually see what the graphs are showing. Fig. 9c is particularly illegible.

      Changed.

      Fig. 1 caption "photonic" —> "optical"?

      Changed

      Can you spell out the actual mathematical definition of 𝜙 on page 5 when it is introduced? Currently it just says the "cross section volume fraction of the biofilm", but that seems potentially ambiguous. It is valid to say that this is "fraction of the cross section occupied by the biofilm"?

      Changed

      Bottom of page 5: can you state the physical interpretation of the assumption that M is bounded between 0 and 1. i.e. that growth is larger than detachment?

      There is a comment on that in the paper. It reads “In assuming that M ∈ ]0, 1] and eliminating cases where M > 1, we have not considered situations of systematic detachment 𝜙equ = 0 for any value of the concentration, since this is not a situation that we encountered experimentally.” This comes just after presenting the expression on the only non-trivial steady-state, as it becomes easier to explain the consequences of the initial choice at this point.

      Currently the choice of detachment initially used in the model is a bit confusing. You say that you are going to assume a (1-𝜙)-1 model for simplicity (bottom of page 5), but then later you find that the (1-𝜙)3/4 model is more accurate (page 16). Since the latter has already been confirmed in numerous other studies, why not start with that one from the beginning?

      We thank the reviewer for this important question, which highlights an area where our presentation could be clearer. We did not find that the (1-φ)-3/4 model is "more accurate." Rather, we deliberately chose the (1-φ)-1 scaling because it captures pressure-induced detachment, which we hypothesized would dominate in confined flows where biofilms clog a large portion of the channel. The (1-φ)-3/4 scaling, widely used in previous studies, describes shear stress at the biofilm/fluid interface and was developed primarily for reactor systems where pressure effects are negligible. Our analysis on page 16 validates this choice by demonstrating that pressure stress indeed exceeds shear stress when volume fraction is large, which corresponds to late Stage I and all of Stage II precisely where our model is applied. The excellent quantitative agreement between predicted and measured φmax values across flow rates (Fig. 7f, Table 1) further supports the (1-φ)-1 scaling. We recognize that our initial presentation may have suggested the (1-φ)-1 choice was merely for "simplicity." We have revised this section to emphasize that this scaling was chosen specifically to capture pressure-driven detachment in confined geometries, with the physical justification provided by the stress analysis that follows. We have also clarified our ideas on page 16 to express clearly that (1-φ)-3/4 is never used. We could alternatively use a multi-modal detachment function combining both scalings, but the data do not require this additional complexity.

      In general, the models you derived in this study could be better contrasted with that from previous works. e.g. can you compare your Eqn (4) with the steady-state solutions obtained by other previous studies? Is this consistent with previous works or different? (aside from framing the biofilm thickness in terms of 𝜙)

      We are currently working on a paper dedicated to modeling biofilm development in confined flows, which will do a better job at comparing approaches.

      Top of page 6 - you assume K* = 0.1 - Does this assume that cells grow at half the rate in 0.1X BHI as they do in 1X BHI? Has this been confirmed experimentally or is this just a guess?

      This was estimated rather than measured directly. Model predictions were a lot more sensitive to the Damköhler number, than to the value of K.

      "radial" is used widely in this paper, but you are using a square geometry. Is "transverse" a better choice?

      Yes it clearly is. It’s been changed.

      Fig 3. Are panels (a) and (b) showing different bioreps of the same condition? If so, please spell that out in the caption.

      There was an error here in the caption of fig a. This has been changed. The correspondence is between a and c, and these are exactly the same, not bioreps.

      In multiple places it noted that the change in hydraulic resistance is correlated with the "change in biofilm colonization." Why not demonstrate this directly using a cross correlation analysis? How is the latter connected to the 𝜙 parameter? (e.g. is this d(𝜙)/dt?)

      We thank the reviewer for this suggestion. To clarify: φ(t) represents the volume fraction of biofilm in the channel. We measure this in two independent ways: (1) φ(t) from hydraulic resistance (black line in Fig. 3) i.e. calculated from pressure measurements using φ = 1 - √(R₀/R(t)), assuming uniform layer growth (see Methods section "Data analysis for the calculation of hydraulic resistance and volume fraction") and (2) φ(t) from fluorescence (green squares in Fig. 3) i.e. estimated from integrated GFP intensity or image segmentation of the glass/liquid interface. The reviewer is correct that we should quantify this relationship directly. We have now added correlation analysis between these two independent measurements of φ (new Supplementary Figure S21). The analysis shows strong positive correlation, with r-values ranged from 0.68 to 0.77 across all flow rates. This validates two key aspects of our approach: (1) the uniform layer assumption used to convert R(t) to φ(t) is reasonable, and (2) the pressure-based measurements accurately capture the dynamics visible in fluorescence imaging, including both growth phases and sloughing events. The strong agreement is particularly notable given that these measurements probe different aspects of the biofilm: hydraulic resistance is sensitive to the three-dimensional obstruction of flow, while fluorescence captures primarily the biofilm attached to the glass surface within our focal plane. Their correlation supports the model assumptions. We have revised the manuscript to clarify this relationship and present the correlation analysis.

      Top of page 9 - a doubling time of 110 mins is reported in liquid culture - is this in shaken or static conditions? Can you provide some data on how this was calculated? (e.g. on a plate reader?) Do you think your measurements in the microfluidics could be affected by attachment/detachment of cells, rather than being solely driven by division. It is curious that your apparent growth rate varies by a factor of two across the different flow rates and there is not a monotonic dependency. Both attachment and detachment would depend on the flow rate (with some non-trivial dependencies).e.g. https://www.pnas.org/doi/10.1073/pnas.2307718120 https://doi.org/10.1016/j.bpj.2010.11.078

      Given that your doubling time in the microfluidics is sole based on changes in cell number (rather than directly tracking cell divisions) it seems possible your results here are measuring the combined effect of growth, attachment and detachment, rather than just growth.

      We agree with those comments regarding the doubling time measurement. We have added a description of how we performed the doubling time measurement in the Methods section.

      Page 9 - you discuss the role of EPS here, but the effect of EPS is not demonstrated here and this is muddled with a discussion about the non-linearity of the putative dependency. Maybe this would be on a firmer footing if you save the discussion of EPS for the section on the Psl and Pel mutants?

      Changed.

      Middle of page 9: Please define what "smooth detachment" means and contrast it with catastrophic sloughing. Also, please define what you mean by "flow, seeding, and erosion" detachment are and how these three things differ from one another.

      We have clearly defined each term in the revised version.

      The results from wavelet scalograms seem to be underutilised and not well described. Can you clearly say what time series this analyses has been calculated on the caption? e.g. hydraulic resistance? Other than simply pointing out the "blue stripes", what can be gained from this analyses that could not be obtained with another method? It would be great if the basic features of this plot could more fully discussed (e.g. is the curved envelope at the bottom caused by edge effects?)

      We have improved the text, captions and method section following the reviewer’s comment.

      Fig. 5 a and b - please list the time at which each of these images were taken. Do these have the same dt between the two sets of images?

      Yes the dt is the same (30 minutes). It’s been indicated in the caption.

      Fig. 6: you have significant 2D variation in the biofilm width along the length of the channel. The relative contribution of pressure and shear based detachment will be different at different positions along the length. However, this variation is ignored in your model. Can you please comment on this in our manuscript and how it might affect the interpretation of your results? e.g. would the longitudinally averaged description yield the same result as one that takes the geometry into account (on average)?

      Our model indeed assumes longitudinally averaged properties. A more detailed spatially resolved model would be valuable for capturing heterogeneities and will be explored in future work.

      Bottom of page 11: you say standard deviations are in the range of 10-3. How does this jibe with the error bars on the middle flow rate in Fig. 7e?

      This extremely low standard deviation only applies to the maximum value of 𝜙 and is a completely different measurement from the whisker boxes presented in fig7e.

      Fig. 7: You are calculating the "Fraction" here. Is this "𝜙"? If so, can you put that on the y-axis instead? You calculate the volume fraction two different ways e.g. with hydraulic resistance and with imaging. Is only one of these shown in (e)? Is the same powerlaw dependence shown in (f) conserved when the other measurement of the "fraction" is used? Can you include both in Fig. 7e?

      We have modified the axis and indicated 𝜙.

      (e) is calculated only from hydraulic resistance. This is the most precise measurement to evaluate 𝜙 quantitatively.

      Related to the previous comment: Some of the estimates of 𝜙max in Table 1 are obtained by fitting the model to integrated fluorescence data (Fig. 2b), while others are estimated from measurements of the hydraulic resistance. The former yields non-unique sets of parameters. Can the biofilm fraction instead actually be estimated directly from fluorescent imaging by segmenting biofilm and directly calculating how much of the cross section is occupied by cells on average across the length? This seems like a more direct measure of this quantity. Given there are multiple ways of estimating the same parameter, it would be better consistency checking to make sure that different methods actually yield the same result.

      We have now added in Fig S21 a direct comparison of these two measurement methods. These are strongly correlated. Microscopy is more direct but only provides 2D pictures. Hydraulic resistance provides a 3D measurement, but relies on a model of biofilm distribution. Both are imperfect, but correlate well. In particular, we see that the 2D measurement does capture sloughing.

      You cite a large number of supplemental figures (e.g. Fig. S21 on page 12), but the figures in your SI only go up to 11.

      We have revised references to supplementary figures.

      Bottom of page 11: Your data from liquid culture suggests that your psl mutant grows at half the rate of WT cells. Is that consistent with your microfluidic data (e.g. Fig. 8)? If not, might this be a sign that your growth rate analyses from the microfluidics might be affected by attachment/detachment? (see comment above) Psl cells should detach much more easily.

      The approach taken to measure doubling times in the microfluidic system does not rely on the macroscopic measurements presented in figure 8, but rather on the approach presented in fig 4. These measurements require specific imaging (different magnification and time stepping) and we did not perform such experiments for the mutants.

      In analyses of sloughing, you fit the times between the jumps and the relative amplitude. Are these two random variables correlated with one another? Might that influence your results? Your methods say that "jumps were identified through through the selection of local maxima" of the derivative. Do you to say "minima" here? Did you keep all local maxima/minima or did you have a threshold?

      These are two random variables, not correlated with another. This is an assumption, and it would be interesting to analyze whether these are correlated. To perform this analysis, we believe that we would first need to acquire even more data and more replications to improve the statistical analysis.

      Yes, it was minima (in the code we make everything positive, hence the confusion).

      Yes, there is a threshold on the value of the jump itself. This value is extremely low and essentially filters out noise.

      Fig. 9 - can you make it clearer in the caption what timeseries you are analysing here? I understand from the methods this that is the "volume fraction." The data/fits are difficult to see in Fig. 9 b and impossible to see in Fig. 9c because the green bars get in the way of the other two data sets. Can this visualisation be improved? It is not clear to me how good of a job the Gamma and log-normal fits are actually doing.

      We have clarified that histograms are calculated from all experiments/replicates.

      We have slightly modified the graph to make it clearer. This comparison is intrinsically hard, partly because it compares discrete data with continuous PDFs.

      Aside from noting the results from the stochastic sloughing model are 'strikingly similar to experimental data', which seems to be based on a qualitative analysis of the lines in Fig. 7 d, e, and f. However, experimental data is not plotted in the same graph nor is the experimental data that we should be comparing this to cited in the text/caption.

      We have added a note in the caption to indicate which figure it can be compared to.

    1. eLife Assessment

      This fundamental study introduces a new biology-informed strategy for deep learning models aiming to predict mutational effects in antibody sequences. It provides convincing evidence that separating selection from the nucleotide-level mutation process improves performance over the objectives of protein language models inspired by natural language processing. This paper should be of interest to computational immunologists, but also to the broader community interested in deep learning for biological sequence data and evolution.

    2. Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost. They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Future versions of the work can consider extending the ideas to additional datasets, species, definitions of fitness, or even different proteins entirely.

      Comments on revisions:

      We thank the authors for addressing our points and have no remaining questions.

    3. Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with an ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve the breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows disentangling these processes, through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acids substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument. In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average but not those which improve binding to specific epitopes.

      Comments on revisions:

      We thank the authors for clarifying the description of the methods and for adding additional discussion of important directions for future work.

    4. Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling and the advantage, in terms of mutational effects prediction as well as computational efficiency, is clearly demonstrated via comparisons to state-of-the-art models.

      Weaknesses:

      While all the main points are well addressed and supported, it could have been interesting to strengthen the claim of gain in interpretability by investigating it explicitly in relation to the functional effects studied in this paper.

      Comments on revisions:

      I thank the authors for clarifying a few points I had flagged up and I appreciate much better that the content of the companion paper was precisely covering model selection and structural interpretability.

      Regarding my first point (references for language models for antibodies), I feel that the parenthetical citation format shouldn't be a problem (but the editors might advise here). Antiberta2 is this paper: https://www.biorxiv.org/content/10.1101/2023.12.12.569610v1.full.pdf (yet, I understand if the authors want to focus on models purely sequence-based). A couple of additional references could be: https://academic.oup.com/bioinformatics/article/40/11/btae659/7888884; https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012646; https://www.pnas.org/doi/10.1073/pnas.2418918121; https://arxiv.org/abs/2506.13006.

      A very minor comment: could one add some p-value (it could be a supplementary table) for the Pearson correlation coefficients? The comparison between methods is rather clear, but for some correlations it's a bit unclear whether they should be considered significant. It would be important to understand the extent to which in different datasets one might expect functional prediction power based on an evolutionary objective function alone.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. eLife Assessment

      This valuable study identifies a novel regulator of stress-induced gene quiescence in C. elegans: the multi-Zinc-finger protein ZNF-236. The work provides evidence for an active mechanism that maintains the repressed state of inducible genes under basal conditions in the absence of stress. The claims for discovery made in the title and abstract are supported by solid experimental data. However, a deeper investigation into the mechanisms of ZNF-236 action could substantially enhance the manuscript's impact and value.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by ILBAY et al describes a screen in C. elegans for loss-of-function of factors that are presumed to constitutively downregulate heat shock or stress genes regulated by HSF-1. The hypothesis posits an active mechanism of downregulation of these genes under non-stressed conditions. The screen robustly identified ZNF-236, a multi zinc finger containing protein, whose loss upregulates heat-shock and stress-induced prion-like protein genes, but which does not appear to act in cis at the relevant promoters. The authors speculate that ZNF-236 acts indirectly on chromatin or chromatin domains to repress hs genes under non-stressed conditions.

      Strengths:

      The screen is clever, well-controlled and quite straightforward. I am convinced that ZNF-236 has something to do with keeping heat shock and other stress transcripts low. The mapping of potential binding sites of ZNF-236 is negative, despite the development of a new method to monitor binding sites. I am not sure whether this assay has a detection/sensitivity threshold limit, as it is not widely used. Up to this point, the data are solid, and the logic is easy to follow.

      Weaknesses:

      While the primary observations are well-documented, the mode of action of ZNF-236 is inadequately explored. Multi Zn finger proteins often bind RNA (TFIII3A is a classic example), and the following paper addresses multivalent functions of Zn finger proteins in RNA stability and processing: Mol Cell 2024 Oct 3;84(19):3826-3842.e8. doi: 10.1016/j.molcel.2024.08.010.). I see no evidence that would point to a role for ZNF-236 in nuclear organization, yet this is the authors' favorite hypothesis. In my opinion, this proposed mechanism is poorly justified, and certainly should not be posited without first testing whether ZNF-236 acts post-transcriptionally, directly down-regulating the relevant mRNAs in some way. It could regulate RNA stability, splicing, export or translation of the relevant RNAs rather than their transcription rates. This can be tested by monitoring whether ZNF-236 alters run-on transcription rates or not. If nascent RNA synthesis rates are not altered, but rather co- and/or post-transcriptional events, and if ZNF-236 is shown to bind RNA (which is likely), the paper could still postulate that the protein plays a role in downregulating stress and heat shock proteins. However, they could rule out that it acts on the promoter by altering RNA Pol II engagement. Another option that should be tested is that ZNF-236 acts by nucleating an H3K9me domain that might shift the affected genes to the nuclear envelope, sequestering them in a zone of low-level transcription. That is also easily tested by tracking the position of an affected gene in the presence and absence of SNF-236. This latter mechanism is also right in line with known modes of action for Zn finger proteins (in mammals, acting through KAP1 and SETDB1). A role for nucleating H3K9me could be easily tested in worms by screening MET-2 or SET-25 knockouts for heat shock or stress mRNA levels. These data sets are already published.

      Without testing these two obvious pathways of action (through RNA or through H3K9me deposition), this paper is too preliminary.

      Appraisal:

      The authors achieved their initial aim with the screen, and the paper is of interest to the field. However, they do not adequately address the likely modes of action. Indeed, I think their results fail to support the conclusion or speculation that ZNF-236 acts on long-range chromatin organization. No solid evidence is presented to support this claim.

      Impact:

      If the paper were to address and/or rule out likely modes of action, the paper would be of major value to the field of heat shock and stress mRNA control.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the identification of ZNF-236 as a key regulator that maintains quiescence of heat shock inducible genes in C. elegans. Using a forward genetic screen for constitutive activation of an endogenous hsp-16.41 reporter, the authors show that loss of znf-236 leads to widespread, HSF-1-dependent expression of inducible heat shock proteins (iHSPs) and a subset of prion-like stress-responsive genes, in the absence of proteotoxic stress. Transcriptomic analysis reveals that znf-236 mutants partially overlap with the canonical heat shock response, selectively activating highly inducible iHSPs rather than the full HSR program. iHSP transgenes integrated throughout the genome generally become de-repressed in znf-236 mutants, whereas the same constructs on extrachromosomal arrays or inserted into the rDNA locus re insensitive to znf-236 loss. Using a newly developed method, Transcription Factor Deaminase Sequencing (TFD-seq), the authors show that ZNF-236 binds sparsely across the genome and does not associate with iHSP promoters, supporting an indirect mode of regulation. Physiologically, znf-236 mutants exhibit increased thermotolerance and maintain iHSP expression during aging.

      Strengths:

      This is a carefully executed and internally consistent study that identifies a new regulator of stress-induced gene quiescence in C. elegans. The genetics are clean and the phenotypes are robust.

      Weaknesses:

      The manuscript is largely descriptive. It would be substantially strengthened by deeper mechanistic insight into what ZNF-236 does beyond being required for default silencing.

    4. Reviewer #3 (Public review):

      Summary:

      The researchers performed a genetic screen to identify a protein, ZNF-236, which belongs to the zinc finger family, and is required for repression of heat shock inducible genes. The researchers applied a new method to map the binding sites of ZNF-236, and based on the data, suggested that the protein does not repress genes by directly binding to their regulatory regions targeted by HSF1. Insertion of a reporter in multiple genomic regions indicates that repression is not needed in repetitive genomic contexts. Together, this work identifies ZNF-236, a protein that is important to repress heat-shock-responsive genes in the absence of heat shock.

      Strengths:

      A hit from a productive genetic screen was validated, and followed up by a series of well-designed experiments to characterize how the repression occurs. The evidence that the identified protein is required for the repression of heat shock response genes is strong.

      Weaknesses:

      The researchers propose and discuss one model of repression based on protein binding data, which depends on a new technique and data that are not fully characterized.

      Major Comments:

      (1) The phrase "results from a shift in genome organization" in the abstract lacks strong evidence. This interpretation heavily relies on the protein binding technique, using ELT-2 as a positive and an imperfect negative control. If we assume that the binding is a red herring, the interpretation would require some other indirect regulation mechanism. Is it possible that ZNF-236 binds to the RNA of a protein that is required to limit HSF-1 and potentially other transcription factors' activation function? In the extrachromosomal array/rDNA context, perhaps other repressive mechanisms are redundant, and thus active repression by ZNF-236 is not required. This possibility is mentioned in one sentence in the discussion, but most of the other interpretations rely on the ZNF-236 binding data to be correct. Given that there is other evidence for a transcriptional role for ZNF-236, and no negative control (e.g. deletion of the zinc fingers, or a control akin to those done for ChIP-seq (like a null mutant or knockdown), a stronger foundation is needed for the presented model for genome organization.

      (2) Continuing along the same line, the study assumes that ZNF-236 function is transcriptional. Is it possible to tag a protein and look at localization? If it is in the nucleus, it could be additional evidence that this is true.

      (3) I suggest that the authors analyze the genomic data further. A MEME analysis for ZNF-236 can be done to test if the motif occurrences are enriched at the binding sites. Binding site locations in the genome with respect to genes (exon, intron, promoter, enhancer?) can be analyzed and compared to existing data, such as ATAC-seq. The authors also propose that this protein could be similar to CTCF. There are numerous high-quality and high-resolution Hi-C data in C. elegans larvae, and so the authors can readily compare their binding peak locations to the insulation scores to test their hypothesis.

      (4) The researchers suggest that ZNF-236 is important for some genomic context. Based on the transcriptomic data, can they find a clue for what that context may be? Are the ZNF-236 repressed genes enriched for not expressed genes in regions surrounded by highly expressed genes?

    5. Author response:

      Updated Response, March 3, 2026

      In the midst of considering the thoughtful and insightful reviews of our manuscript and updating our work accordingly, we wanted to provide an interim update.

      In the reviews of our paper, each of the reviewers brought up questions about the specificity and sensitivity of a new "TFD-Seq" assay for protein-DNA specificity in vivo that we had developed for this work and applied here for the first time with a complex eukaryote (Figure 4). While we remain strong proponents of developing in vivo assays for protein-DNA interaction, we took to heart the concerns that the reviewers had expressed. We have therefore, in the past few weeks, done a rather "deep dive" into both the technical aspects of the TFD-Seq data and the conceptual and statistical aspects of how TFD mutation data can be interpreted. From this analysis, we find ourselves in agreement with the concerns. In particular, our "deep dive" has suggested that conclusions from TFD data (particularly negative conclusions on the presence of binding sites) will require a better understanding of signal and noise in the kind assay used in Figure 4.

      As the work is current in the submitted/preprint stage, we look forward to spending some time working (as appropriate) on both improvements to current protocols and alternative experiments to support the novel assay. An updated preprint which (for now) conveys the body of work and conclusions (which are not substantially altered), while avoiding the complexities of the TFD-seq assay is available at BioRXIV, and we will look forward to sending a version-of-record over the next few months as we have had a chance to provide robust tests for the macromolecular targets/interactors for ZNF-236 factor that was identified in this study.

      We again thank the reviewers (peer review is indeed really a good thing) and look forward to updating everyone soon.

      Updated bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2025.10.22.683740v3

      Original Response, January 5, 2026

      We thank the reviewers for their insights and suggestions. We appreciate that the reviewers were engaged by both the observations and their interpretation, and consider their interest in further analysis and clarified discussion to be the best possible compliment to this work.

      As noted by the reviewers, the working hypothesis of a nuclear organization role for ZNF-236 is just one model. Clarifying this model and potential alternatives will certainly add to the manuscript and this will be a key part of the revision.  Beyond this, several suggested analyses should explore extant models, while providing context for considering alternatives.  We look forward to carrying out such analyses as feasible and will report them in the revised manuscript.

    1. eLife Assessment

      This important work by Qin et al. delineates layered neuropeptidergic mechanisms that regulate sugar intake in a hunger state-dependent manner. Using a combination of genetic, physiological, and behavioral experiments, the authors convincingly show that Hugin- and Allatostatin A-releasing neurons are selectively active in sated flies and suppress sugar feeding by reducing the sensitivity of Gr5a-expressing gustatory neurons. They further demonstrate that Neuromedin U neurons share key physiological properties with fly Hugin neurons, highlighting conserved peptide functions across animal phyla.

    2. Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

    3. Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism. Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field.

      We thank the reviewer for the thoughtful comment. We agree that the behavioral effects of manipulating the Hugin–AstA system alone were considerably weaker than the pronounced PER shifts induced by starvation. We have revised our Discussion to address it by positioning our findings within the broader context of energy regulation.

      More specifically, we discuss that feeding behavior is controlled by two distinct, yet synergistic, types of mechanisms:

      (1) Hunger-driven 'accelerators': as the reviewer notes, pathways involving dopamine and NPF are powerful drivers of sweet sensitivity. These systems are strongly activated by hunger to promote food-seeking and consumption.

      (2) Satiety-driven 'brakes': our study identifies the counterpart to those systems above, aka. a satiety-driven 'brake'. The Hugin–AstA pathway acts as a direct sensor of high internal energy (glucose), which is specifically engaged during satiety to actively suppress sweet sensation and prevent overconsumption.

      This framework explains the seemingly discrepancy in effect size. The dramatic PER shift seen upon starvation is a combined result of engaging the 'accelerators' (hunger pathways like TH/NPF) while simultaneously releasing the 'brake' (our Hugin–AstA pathway being inactive).

      Our manipulations, which specifically target only the 'brake' system, are therefore expected to have a more modest effect than this combined physiological state. Thus, rather than being a "minor part," the Hugin–AstA pathway is a mechanistically defined, satiety-specific circuit that is essential for the precise "braking" required for energy homeostasis. We will update our Discussion to emphasize how these 'accelerator' and 'brake' circuits must work in concert to ensure precise energy regulation.

      In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

      We thank the reviewer for highlighting the critical issue of subtype-level specialization within the Hugin and AstA populations.

      We fully agree that the Hugin system is known for its functional heterogeneity (pleiotropy), with different Hugin neuron subclusters implicated in regulating a variety of behaviors, including feeding, aversion, and locomotion (e.g., Anna N King, Curr Biol, 2017, Andreas PLoS Biol, Sebastian et al., 2016, Nat Comm). Our finding that only a specific subcluster of Hugin neurons is responsive to glucose elevation provides a crucial first step in functionally dissecting this complexity.

      we have added a dedicated paragraph to elaborate on this functional partitioning in the discussion. We propose that this subtype-level specialization allows the Hugin system to precisely link specific physiological states (like high circulating glucose) to appropriate behavioral outputs (like the suppression of sweet taste), demonstrating an elegant solution to coordinating multiple survival behaviors. Future work using high-resolution tools such as split-GAL4 and single-cell sequencing will be invaluable in fully mapping the specific functional roles corresponding to each Hugin and AstA subcluster.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

      We thank the reviewer for this insightful comment and agree that the modest behavioral effect of Glut1 knockdown is a critical finding that warrants further clarification. This observation strongly supports the idea that internal energy state is monitored by a sophisticated and robust network, not a single, fragile component. We believe the effect size is modest for two main reasons, which we have addressed in revised Discussion.

      Firstly, the effect size is likely attenuated by technical and molecular redundancy. Specifically, the RNAi-mediated knockdown of Glut1 may be incomplete, leaving residual transporter function. Furthermore, Glut1 is likely only one part of the Hugin neuron's intrinsic sensing mechanism; other components, such as alternative glucose transporters or downstream K<sub>ATP</sub> channel signaling, may provide molecular redundancy, meaning that the full energy-sensing function is not easily abolished by a single manipulation.

      Secondly, and more importantly, the final feeding decision is an integrated output of competing circuits. While hunger-sensing pathways like the dopamine and NPF circuits act as powerful "accelerators" to drive sweet consumption, the Hugin–AstA pathway serves as a satiety-specific "brake." The modest effect of partially inhibiting just one component of this 'brake' system is the hallmark of a precisely regulated, multi-layered homeostatic system. We have clarified in the Discussion that the Hugin pathway represents one essential inhibitory circuit within this cooperative network that works together with the hunger-promoting systems to ensure precise control over energy intake.

      Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths:

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

      The reviewer is correct that several energy-sensing pathways are known. However, we now clarify that these previously established mechanisms, such as the dopaminergic and NPF pathways, primarily function as hunger-driven "accelerators." They are activated by low-energy states to promote sweet sensitivity and drive consumption.

      The crucial, missing piece of the puzzle—which our study provides—is the satiety-specific "brake" mechanism. We identify the Hugin–AstA circuit as one of the “brakes”: a dedicated, central sensor that responds directly to high circulating glucose (satiety) to suppress sweet sensation and prevent overconsumption.

      Thus, our work is unique because it defines the essential counterpart to the hunger pathways. In the revised Discussion, we have explained how these 'accelerator' (hunger) and 'brake' (satiety) systems work in concert to allow for the precise, bidirectional regulation of energy intake. Furthermore, by demonstrating that this Hugin/NMU 'brake' circuit is evolutionarily conserved in mice, our findings reveal a fundamental energy-sensing strategy and suggest that this pathway could represent a promising new therapeutic target for managing conditions of excessive food intake.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Considering the comments from all three reviewers, new experiments are not necessary, but the authors are welcome to provide new pieces of evidence that would strengthen their conclusions. To assist the authors with their revisions, the comments have been categorized from the highest to lowest priority based on the concerns raised by reviewers 1, 2, and 3.

      High priority:

      (1) Acknowledgement of partial phenotypes by the genetic manipulations, especially relative to other neuromodulators that are involved in the adjustment of sugar sensitivity after starvation (1, 2).

      Please see our responses to the Public Review 1 for details.

      (2) Detailed discussion on the novelty of the present work, also in light of previous studies both in flies and mammals (known Drosophila modulators, as well as NMU-rNST circuit on sugar sensation) (1, 2, 3).

      Please see our responses to the Public Review 3 for details.

      (3) Medium priority:

      • Discussions on the subtype-specific function of hugin neurons (1).

      Please see our responses to the Public Review 1 for details.

      • Discussions on the pleiotropic effect of changes in the level of circulating sugar (including release of other sugar types) (2, 3).

      We agree that circulating sugars represent a complex, systemic signal with broad, pleiotropic effects, and we have expanded our Discussion to address this.

      We will discuss the functional distinction between key hemolymph sugars, such as trehalose (the main circulating sugar, critical for stress/flight) and glucose (the primary, rapidly mobilized energy currency). While various sugars collectively influence metabolic status, our study’s unique focus is on the direct neural link between internal energy and sweet taste modulation. We clarify that our work precisely identifies glucose as the direct, key ligand for the Hugin satiety circuit, thus providing a concrete, mechanistically defined link from systemic energy complexity to the specific regulation of sweet sensation.

      • Illustration or clear explanations of sugar application methods in mouse experiments (ex. Figure 5F vs Figure 5M), as well as discussion on the concentration of sugar solutions used (3).

      We have added the relevant details in the figure legends and explain the rationale for using this concentration of sugar in the results.

      • Less saturated image for Figure 5K (3).

      We have adjusted Figure 5K to reduce image saturation for clarity.

      • Discussions on the modest effect of NMU on rNST neurons (Figure 5M) (3).

      In the revised results, we have discussed that the modest suppression of rNST activity likely reflects partial peptide diffusion and the heterogeneous composition of sweet-responsive rNST neurons.

      (4) Low priority:

      • Systematic quantification of multiple types of sugars after starvation (3).

      We agree that circulating sugars represent a complex metabolic milieu, and a fully systematic biochemical quantification of individual hemolymph sugars after starvation would be informative. While such analyses are beyond the scope of the present study, we have addressed this point at the functional level by systematically pre-feeding flies with different types of dietary sugars prior to PER assays.

      We find that multiple sugars are capable of suppressing PER, indicating that satiety-related behavioral inhibition is not unique to a single carbohydrate source. Notably, sucrose produces the strongest suppression, consistent with its rapid metabolic conversion and effectiveness in elevating internal glucose levels. These results support the notion that diverse dietary sugars converge on a common satiety-signaling mechanism, while our mechanistic analyses specifically identify glucose as the key ligand engaging the Hugin satiety circuit.

      We now clarify this distinction in the revised Discussion.

      • Testing Gr64f neurons or mutants (3).

      Our results indicate that energy sensing in the CNS suppresses sweet-sensing neuron activity (e.g., via hyperpolarization) rather than directly blocking sugar binding to receptors. Thus, sweet perception—not sugar detection—is inhibited. As evidence, in Figure supplementary4 we measured the PER to fructose and trehalose. Although Gr5a and Gr64a differ in their sensitivity to these sugars, the CNS energy state consistently suppresses sweet perception for both. As Reviewer 3 noted, Gr5a and Gr64f are co-expressed in sweet neurons; while they respond to different sugars, their labeling of the neurons is largely equivalent.

      • Testing sugar preference (glucose vs. other sugars) (3)

      Since our primary goal was to identify a direct satiety-sensing and sensory-modulating circuit—the "brake" mechanism—PER served as the most suitable and mechanistically specific readout. While manipulation of the Hugin–AstA circuit influences internal state, and therefore likely alters long-term sugar preference, investigating the integration of this pathway with reward and post-ingestive signaling is a critical question that lies beyond the scope of the current study.

      • Cell type-specific knockout of NMU (3).

      Achieving a cell type-specific knockout of NMU using the Cre approach is not feasible in the short term. While previous studies have reported the role of NMU in the VMH region in regulating feeding, our contribution lies in revealing how these neurons sense energy. We also show that these neurons project to the vicinity of Calb2 neurons and that the neuropeptide can suppress Calb2 neuronal activity. This essentially demonstrates that the hugin–Gr5a pathway in Drosophila is conserved in mice. We believe that a detailed dissection of the precise circuitry in mice is more appropriate to address in a subsequent study.

      • Explanation of NMU detection in Figure 5K (3): this is GFP expressed by the Cre-dependent virus.

      We have revised the Figure 5K legend to clarify that NMU<sup>+</sup> neurons are labeled by GFP expression from a Cre-dependent AAV2/1-DIO-GFP, which undergoes anterograde trans-synaptic transfer. We further explain that GFP expression in rNST neurons requires local AAV-Cre injection, enabling identification of postsynaptic Calb2<sup>+</sup> target neurons.

      • Neuronal manipulation of NMU neurons by optogenetics or DREADD.

      Please see our responses to the question “Cell type-specific knockout of NMU.”

      Reviewer #1 (Recommendations for the authors):

      A major concern about the study is that the effect of genetic manipulations on Hugin/AstA system appears to account for only a small part of the dramatic shift of PER probability toward smaller concentrations of sucrose solutions among starved flies. In Figure 1B and E, PER probability is significantly higher among starved flies in response to 10-200mM of sucrose solutions than fed flies. Compared to this, RNAi knockdown of glucose transporter in hugin neurons (Figure 2C), PK2-R1 pan-neuronally (Figure 3C) or in AstA-releasing neurons (Figure 3G), AstA-R1 in Gr5a neurons (Figure 4E), systemic mutation of PK-R2 (Figure Supplement 10) and AstA-R1 (Figure Supplement 12) all produce relatively minor behavioral changes. Consistent with previous works, the mutation of TH causes a robust decrease of PER across the entire range of sucrose concentration tested (Figure Supplement 1).

      These discrepancies can be caused by many technical limitations that cannot be readily addressed. For instance, the large effect of TH can be confounded by the pleiotropic behavioral effect of the lack of dopamine. RNAi can suffer from incomplete elimination of targeted genes. However, the relatively small behavioral effect size of these manipulations cannot be entirely ignored in light of previous publications, which point to the importance of other neuromodulators such as dopamine, serotonin, Akh, and NPF, on sugar sensitivity (Marella et al., 2012; Inagaki et al., 2014; Yao et al., 2022), as well as other potentially parallel glucose-sensing systems, including Gr43a-expressing cells (Miyamoto et al., 2012) and sNPF-expressing CN neurons (Oh et al., 2019). While the neuropeptides initially tested (Figure 1) are not poor choices, it is a missed opportunity that so many other neuromodulators were excluded from the initial search.

      We appreciate the reviewer’s detailed analysis and agree that the magnitude of behavioral effects produced by manipulating the hugin–AstA pathway is smaller than the dramatic shift in PER observed under starvation conditions. This comparison is important and highlights a central conceptual point of our study.

      Starvation represents a compound physiological state that simultaneously engages multiple hunger-promoting neuromodulatory systems—most prominently dopaminergic and NPF pathways—while also releasing satiety-associated inhibitory signals. As shown previously and confirmed here (Figure supplementary 1), manipulation of dopamine synthesis produces a broad and robust reduction in PER across sucrose concentrations, consistent with its role as a powerful hunger-driven modulator.

      By contrast, our genetic manipulations specifically target a satiety-associated inhibitory circuit—the hugin–AstA pathway—that is selectively engaged by high internal glucose levels. Manipulating this pathway alone therefore isolates a single “brake” component of feeding regulation, rather than recapitulating the full physiological state of starvation, which combines both accelerator activation and brake release. Accordingly, the more modest behavioral effects we observe are an expected consequence of dissecting one defined regulatory module from a larger, cooperative network.

      We agree that multiple neuromodulators, including dopamine, serotonin, Akh, NPF, and others, as well as parallel glucose-sensing systems such as Gr43a-expressing cells and sNPF-expressing CN neurons, contribute to the regulation of sugar sensitivity. Rather than aiming to exhaustively screen all neuromodulators, our study was designed to identify and mechanistically define a central, glucose-responsive satiety sensor that directly links internal energy state to sweet taste modulation. In the revised discussion, we now explicitly position the hugin–AstA circuit as one essential, satiety-specific component within this broader regulatory landscape and discuss how it functionally complements previously characterized hunger-driven pathways.

      I am also confused by the results of Shibirets1-mediated silencing of Hugin and AstA neurons (Figure Supplement 13B, C). It is unclear to me why a feeding assay was used instead of PER, like the activation experiments. Feeding (ingestion) and PER are qualitatively different types of behavior, which cannot be directly compared. Moreover, the definition of "fold change" is not provided either in the figure legend or in the Materials and Methods section, making it difficult to understand what the figure means.

      We thank the reviewer for pointing out this important issue regarding the interpretation of the Shibire^ts1-mediated silencing experiments. We agree that proboscis extension reflex (PER) and feeding/ingestion assays reflect qualitatively different behavioral processes and should not be directly compared.

      In the original submission, feeding assays were used to assess the effect of neuronal silencing, which led to ambiguity when comparing these results with PER-based activation experiments. To directly address this concern and ensure consistency across behavioral readouts, we have now performed additional PER experiments under the same Shibire^ts1-mediated silencing conditions.

      These new data demonstrate that acute silencing of hugin neurons significantly enhances PER responses to sucrose (Figure supplementary 13B), indicating increased sweet sensitivity. This result is fully consistent with our activation experiments and supports the conclusion that the hugin–AstA pathway suppresses sweet taste perception under satiety conditions.

      In addition, we have revised the figure legend to explicitly define the “fold change” metric used in the behavioral analysis, clarifying how the values were calculated and normalized. Together, these changes resolve the ambiguity raised by the reviewer and strengthen the behavioral consistency of our conclusions.

      Of note, Marella et al. (2012) reported that silencing of Hugin-releasing neurons did not affect PER. It is therefore possible that the Hugin system is sufficient, but not necessary, for modulating PER under food deprivation.

      We agree that their observation—that silencing Hugin-releasing neurons does not alter PER in starved flies—is consistent with a state-dependent role of the Hugin system in feeding regulation.

      In starved animals, dopaminergic TH<sup>+</sup> neurons are strongly activated and promote high PER responsiveness, while circulating glucose levels are low, placing Hugin neurons in a relatively inactive state. Under such conditions, further silencing of Hugin neurons would be expected to produce minimal additional effects on PER, which likely explains the results reported by Marella et al.

      Importantly, our data show that preventing the starvation-associated reduction in Hugin neuronal activity—by thermogenetic activation of Hugin<sup>+</sup> neurons (Hugin–TrpA1; Figure 1D)—significantly suppresses the hunger-induced enhancement of PER. These results indicate that dynamic downregulation of Hugin neuronal activity is a critical component of the normal behavioral shift in sweet sensitivity in response to food deprivation. Thus, while Hugin neurons may not be required to further modulate PER once animals are already in a strongly starved state, their regulated activity change is essential for mediating state-dependent modulation of sweet taste behavior. We have added discussion in the revised manuscript.

      While no new experiments are requested, it is important for authors to acknowledge the limited effect size of Hugin/AstA manipulation. In the current manuscript, the authors briefly mention the previous works (lines 460-462, 472-474), which is insufficient. Discussions must include how the Hugin/AstA system may "complement these established mechanisms (line 460)" (described in the references listed above), under what situations this novel Hugin/AstA system can be relevant for controlling PER, and why the fly is equipped with seemingly redundant systems for sensing internal glucose levels and controlling feeding behavior. Without these discussions, it is difficult to recognize the novelty of the presented work. The data appears largely to be a minor and incremental progress on an already mature field.

      In the revised manuscript, we have substantially expanded the Discussion to explicitly acknowledge this limited effect size and to clarify the functional role of the Hugin–AstA pathway within the broader energy-regulatory network. We now emphasize that this circuit represents a satiety-specific inhibitory branch that complements, rather than replaces, previously described hunger-promoting systems such as dopaminergic, NPF, and AKH circuits.

      Importantly, we discuss the specific physiological conditions under which the Hugin–AstA system is most relevant—namely, post-feeding and high-glucose states. Unlike hunger circuits that amplify sweet sensitivity during starvation, the Hugin–AstA pathway directly senses circulating glucose and rapidly suppresses sweet taste perception when energy is sufficient, thereby acting as a brake to prevent overconsumption.

      We further address the apparent redundancy among internal sugar-sensing systems. Rather than being redundant, these pathways form a coordinated and layered network with distinct sugar specificities, temporal dynamics, and functional roles. For example, Gr43a<sup>+</sup> neurons primarily detect fructose, whereas hemolymph glucose represents the principal energetic currency in Drosophila. The use of multiple internal sugar sensors allows flies to fine-tune feeding decisions across different nutritional contexts and timescales.

      Finally, we expand the Discussion to highlight that although the Hugin–AstA circuit constitutes only one branch of the energy-sensing network, its disruption leads to excessive energy intake (Figure supplementary 13C-E, G) and increased fat accumulation (Figure S13F), underscoring its physiological relevance. We also discuss how this pathway likely interacts with other neuromodulatory systems, including TH<sup>+</sup> dopaminergic and NPF<sup>+</sup> neurons, to collectively orchestrate adaptive feeding behavior and energy homeostasis.

      Together, these additions clarify that our work does not simply add another neuromodulator to an already mature field, but instead identifies a distinct glucose-sensing, satiety-linked mechanism that fills a conceptual gap between internal energy state detection and sensory modulation.

      Another perceived weakness is the lack of subtype-level dissection among Hugin- and AstA-releasing neurons. I make a justified request to narrow down the behaviorally relevant neuron to one (or one type), which is based on a widespread but unreasonable and dangerous assumption that every behavior must be controlled by one neuron. However, the authors present very interesting data that only a subset of Hugin- and AstA-releasing neurons responds to higher levels of sucrose (Figure 1H, Figure Supplement 7A, B), which leads to a hypothesis that a specific subtype within each peptidergic neuronal group is responsible for starvation-induced behavioral change. The authors only briefly touch upon this (lines 217-218), but this is an important hypothesis that requires further discussion.

      We thank the reviewer for highlighting the importance of neuronal heterogeneity within the Hugin- and AstA-releasing populations. We fully agree that the observation that only a subset of Hugin<sup>+</sup> and AstA<sup>+</sup> neurons responds to elevated sucrose levels (Figure 1H; Figure Supplement 7A, B) strongly suggests functional specialization within these peptidergic groups.

      In the revised Discussion, we now explicitly propose that distinct subtypes of Hugin and AstA neurons differentially contribute to energy sensing and feeding modulation. We suggest that glucose-responsive subpopulations may be specifically engaged in satiety signaling, whereas other neurons within the same genetic classes may participate in additional physiological or behavioral processes. This heterogeneity provides a plausible explanation for the partial behavioral effects observed following population-level manipulations. Although we did not perform subtype-specific perturbations in this study, our findings provide a foundation for identifying these subtypes in future work using split-GAL4 lines and connectomic datasets.

      These issues are more important than the sprawling and unfocused review of various hunger and satiety-controlling systems across species in the Introduction. Lines 53-108 contain only tangential information to the main conclusion of the paper. Both the Introduction and Discussion sections must be completely restructured so that readers understand what is already known about hunger-induced changes in feeding-related behavior, what is a missing gap of knowledge in neural mechanisms controlling behavioral adaptation under starvation, and why Hugin/NMU is an interesting target in this context.

      We thank the reviewer for this important structural critique. We agree that, in the original manuscript, the Introduction placed disproportionate emphasis on a broad survey of hunger- and satiety-regulating systems across species, which may have obscured the central conceptual advance of this study.

      In the revised manuscript, we have substantially restructured both the Introduction and the Discussion to sharpen the narrative focus and clarify the specific knowledge gap addressed by our work.

      First, the Introduction has been streamlined to focus on what is already known about hunger-induced modulation of feeding-related behaviors, particularly sweet taste sensitivity and PER in Drosophila. We now emphasize that prior studies have predominantly characterized hunger-activated, feeding-promoting pathways (e.g., dopaminergic, NPF, AKH systems) that act as accelerators of food-seeking behavior.

      Second, we explicitly define the missing gap in knowledge: while hunger-driven mechanisms are well studied, it remains unclear how satiety states—specifically elevated internal glucose levels—are directly sensed by central neurons and translated into suppression of sensory gain and feeding behavior.

      Third, we reposition Hugin/NMU as an attractive and conceptually distinct target because of its peptidergic nature, evolutionary conservation, and previously reported but mechanistically unresolved links to feeding regulation. This framing motivates our central question: whether Hugin/NMU neurons function as a direct internal energy sensor that actively implements a satiety-specific inhibitory control over taste perception.

      In parallel, the Discussion has been reorganized to avoid an unfocused review of feeding circuits across species and instead to interpret our findings within a clear conceptual framework. We now emphasize that the Hugin–AstA (and NMU) pathway represents a satiety-driven “brake” that complements, rather than duplicates, established hunger-driven “accelerator” circuits. This restructuring clarifies both the novelty of our findings and their relevance within the existing literature.

      Reviewer #2 (Recommendations for the authors):

      When discussing the results of Figure 1, such as lines 203-204, "These results demonstrate that sugar intake inhibits sweet sensation, probably via increasing circulating sugar levels" it may be worth discussing the known impact of sweet sensation experience on future sweet taste responses. With the data shown here, it is difficult to conclusively separate blood glucose levels from the sweet sensation that happens during the re-feeding. The "normal diet minus sucrose" does not blunt the starved PER effect, but that could potentially be impacted by either/both sugar intake or sweet taste.

      We thank the reviewer for this thoughtful and important point. We agree that sweet taste experience itself can influence subsequent sweet sensitivity, and that separating the contribution of sensory experience from nutrient-derived internal energy is non-trivial.

      In the revised manuscript, we have clarified the experimental timing by explicitly stating that PER was assessed 15 minutes after refeeding. At this time point, hemolymph glucose levels have returned to baseline (Figure supplementary 5), supporting the physiological relevance of glucose-dependent activation of Hugin neurons under our experimental conditions.

      We also acknowledge that sweet taste exposure can induce sensory adaptation and modulate future taste responses. To directly address this potential confound, we performed additional control experiments during revision (Figure supplementary 4B) in which starved flies were refed with sorbitol (caloric but not sweet) or arabinose (sweet but non-nutritive). We found that both manipulations partially reduced PER, but neither recapitulated the full suppressive effect of sucrose refeeding.

      These results indicate that sweet taste experience and metabolic energy contribute in parallel to the regulation of sweet sensitivity. Importantly, the incomplete effects of sorbitol or arabinose alone suggest that neither sensory adaptation nor caloric value is sufficient by itself to fully account for the observed PER suppression.

      Accordingly, we have revised the Discussion to clarify that the Hugin–AstA pathway likely operates within a broader, multi-layered regulatory framework, integrating internal metabolic state with sensory experience, rather than acting as a sole determinant of post-feeding sweet sensitivity. This clarification avoids over-attribution of the behavioral effect to circulating glucose alone while preserving the central conclusion that internal energy state is a key modulator of sweet perception.

      Blocking cellular sugar intake or metabolism could be impacting the ability of neurons to function, distinct from any specific intracellular regulatory mechanism that glucose or its derivatives might be involved with. That may be a caveat worth mentioning in the results or discussion.

      We thank the reviewer for raising this important caveat. We agree that blocking cellular sugar uptake or metabolism could, in principle, impair neuronal function in a nonspecific manner, independent of any dedicated intracellular glucose-sensing mechanism.

      In the revised manuscript, we now explicitly acknowledge this possibility and clarify the scope of our interpretation. Several features of our data argue against a generalized loss of neuronal function as the primary explanation. First, the behavioral and physiological effects observed upon manipulation of glucose transport or K<sub>ATP</sub> channel activity are rapid and reversible, consistent with state-dependent modulation rather than chronic metabolic failure. Second, these manipulations selectively affect sweet sensitivity and feeding-related behaviors, without causing gross deficits in proboscis extension or neuronal responsiveness.

      Accordingly, we have revised the Results to emphasize that while intracellular glucose metabolism is required for normal neuronal activity, our findings specifically support a role for glucose-dependent modulation of neuronal excitability in satiety signaling, rather than a nonspecific energetic impairment.

      Minor suggestions:

      (1) Figure 2G: "Pryuvate" -> "Pyruvate."

      We have corrected “Pryuvate” to “Pyruvate”

      (2) "Fly" methods section: it says that flies were kept on 2% agar for 12 hours for starvation, but in the Figure 1A description, it says 24 hours.

      We have corrected the description in Figure 1A.

      Reviewer #3 (Recommendations for the authors):

      (1) SEZ Hugin+ and AstA+ neurons were activated by glucose (Figures 1G, 1I), yet hemolymph also contains trehalose and fructose. For instance, DH44 neurons respond broadly to all hemolymph sugars (Dus et al., 2015), while Gr43a neurons specifically detect fructose (Miyamoto et al., 2012). The present study does not clarify whether Hugin+ or AstA+ neurons are similarly sugar-specific or more broadly tuned. A systematic analysis is needed to determine whether these circuits are selective for glucose.

      We thank the reviewer for raising this important question regarding sugar specificity. We agree that hemolymph contains multiple sugars, including trehalose and fructose, and that distinct neural systems have been shown to differ in their tuning breadth. To address this issue, we performed additional experiments during revision in which starved wild-type flies were refed with different sugars—including sucrose, fructose, trehalose, and sorbitol—followed by PER measurements. We found that sucrose refeeding produced the strongest suppression of PER, whereas fructose, trehalose, and sorbitol induced weaker effects (Figuresupplementary 4A).

      We interpret these results as suggesting a preferential sensitivity of the Hugin/AstA pathway to glucose availability rather than a broad responsiveness to all circulating sugars. One plausible explanation is that fructose, trehalose, and sorbitol require peripheral metabolic conversion before contributing to intracellular glucose levels in neurons, whereas sucrose feeding rapidly restores hemolymph glucose within the 15-minute time window used in our experiments (Figure supplementary 5).

      Importantly, we now clarify in the revised Results and Discussion that our data support a functional preference for glucose under physiological conditions, rather than excluding the possibility that other sugars may influence this circuit indirectly or on longer timescales.

      (2) The authors state that SEZ, but not VNC, Hugin+ neurons regulate AstA activity (lines 318-319). However, comparison of Figure Supplement 8B with the severing sample in Figure Supplement 11B shows a more pronounced reduction of sweet sensation under hug>TrpA1 activation. Although the absolute response in Figure 3F (in vivo) is higher than that in the cut-off preparation (Figure S11), comparison of Figure S11C with Figure 3F indicates that hug+ neurons drive an AstA+ calcium transient more than fourfold greater in the presence of VNC neurons. Thus, the contribution of Hugin+ VNC neurons cannot be dismissed, and the conclusion should be revised accordingly.

      We thank the reviewer for this careful and quantitative comparison. We agree that our original wording overstated the exclusivity of SEZ Hugin<sup>+</sup> neurons in regulating AstA activity.

      Upon closer examination of the data, we now acknowledge that VNC Hugin<sup>+</sup> neurons likely contribute to AstA activation. As the reviewer points out, the AstA<sup>+</sup> calcium response evoked by Hugin activation is substantially larger when VNC neurons are intact (Figure supplementary11C) compared with the cut preparation (Figure 3F), indicating that descending inputs from the VNC can potentiate AstA neuronal activity.

      Accordingly, we have revised the manuscript to state that SEZ Hugin<sup>+</sup> neurons play a predominant role in driving AstA responses relevant to sweet sensation, while VNC Hugin<sup>+</sup> neurons provide additional modulatory input that enhances the overall magnitude of Hugin signaling. These revisions have been made in the Results to more accurately reflect the contributions of distinct Hugin subpopulations.

      (3) In Figure 4D, you show AstA-R1 co-localized with Gr5a-expressing cells. However, Gr5a-expressing cells also co-express Gr64f in labellum (Fuji et al., 2015, Current Biology). Are the authors sure that the sweet sensation they described is Gr5a-specific? Testing Gr64f is essential. Moreover, Fuji et al. demonstrated that Gr5a loss-of-function mutation impairs not only sucrose but also maltose, fructose, and trehalose sensation. This raises a question of whether the Hug+ and AstA+ neurons identified in the current study contribute to sensing sugars beyond sucrose. Additional experiments are required to clarify this point.

      Please see our responses to the Reviewing Editor Comments (4).

      (4) While nutritive sugar sensors such as Dh44 neurons have been directly implicated in sugar preference (Dus et al., 2015, Neuron), this study examines the hug+,AstA+, Gr5a neuronal circuit only in the context of PER responses. Why is sugar preference not assessed here, especially given that in mice, the comparison was made using preference tests?

      We thank the reviewer for this insightful question. We agree that sugar preference assays provide important information about feeding decisions and reward-based behavior. In the present study, however, we deliberately focused on the proboscis extension reflex (PER) because it offers a direct, quantitative, and temporally precise readout of sweet sensory sensitivity at the sensory–motor level.

      PER allows us to isolate changes in taste perception itself, largely independent of post-ingestive reinforcement, learning, or motivational state, all of which strongly influence preference-based assays. This distinction is particularly important given our central goal of identifying a circuit that directly links internal energy sensing to modulation of peripheral sweet-sensing neurons.

      By contrast, sugar preference reflects an integrated behavioral outcome combining sensory input, internal state, and post-ingestive reward signals, including those mediated by DH44 neurons and other nutritive sensing pathways. We therefore chose PER as the most mechanistically specific assay to dissect the Hugin–AstA–Gr5a pathway. We now explicitly acknowledge in the revised Discussion that determining how this satiety-linked sensory modulation interacts with reward and post-ingestive circuits to shape long-term sugar preference will be an important direction for future studies.

      Several other concerns:

      (5) The intraperitoneal injection of NMU is interpreted as reflecting a brain-specific NMU effect, but such systemic delivery cannot exclude peripheral actions. In Figure 5D, the use of whole-body KO mice is insufficient; targeted manipulations (e.g., NMU-Cre-driven inactivation) are required to establish circuit-specific behavioral roles.

      Please see our responses to the Reviewing Editor Comments (Low priority)

      (6) In Figure 5F and 5M, neural activity is measured under different conditions: gastric glucose infusion in 5F versus glucose licking in 5M. To establish that NMU VMH neurons and Calb2 rNST neurons belong to the same circuit, this discrepancy in stimulation timing must be resolved to support the conclusions.

      We thank the reviewer for pointing out this important issue regarding stimulation paradigms in Figures 5F and 5M. We agree that the difference between gastric glucose infusion and glucose licking requires explicit clarification.

      In the revised manuscript, we now clearly state that these two paradigms were intentionally designed to probe complementary levels of the same NMU–Calb2 circuit. In Figure 5F, gastric glucose infusion was used to isolate the internal energy-sensing property of VMH NMU<sup>+</sup> neurons, independent of oral sensory input, motor behavior, or reward expectation. This experiment establishes that NMU<sup>+</sup> neurons are directly activated by elevated circulating glucose.

      By contrast, Figures 5M examined how activation of this NMU pathway modulates downstream Calb2<sup>+</sup> rNST neurons under physiologically relevant feeding conditions, in which sweet taste signals are naturally evoked by licking. This design allows us to test the functional consequence of NMU signaling on sweet-responsive rNST neurons during normal sensory processing.

      Although the route and timing of glucose delivery differ, both paradigms converge on a unified circuit model: internal glucose elevation activates VMH NMU<sup>+</sup> neurons, and NMU signaling suppresses sweet-driven activity in Calb2<sup>+</sup> rNST neurons. We have revised the Results and figure legends to explicitly describe this layered experimental logic and to clarify that Figures 5F and 5M together establish distinct but connected nodes of the same circuit.

      (7) Figure 5I-J. The glucose concentration used appears excessively high. In mammals, blood glucose in the sated state is ~7-8 mM. It is unclear whether the observed responses represent physiological effects or artifacts of supraphysiological stimulation. Additional experiments with lower glucose concentrations would strengthen the study.

      We thank the reviewer for raising this important concern regarding the glucose concentration used in Figure 5I–J. We agree that the concentration applied in ex vivo slice experiments exceeds the typical physiological range of circulating glucose.

      This higher concentration was intentionally chosen to ensure reliable neuronal activation in acute brain slices, where glucose diffusion, uptake, and metabolic access are substantially slower than in vivo. Similar approaches have been widely used in studies of glucose-sensitive hypothalamic neurons to overcome these technical limitations (e.g., Kim et al., 2025., Neuron).

      Importantly, the physiological relevance of our findings is supported by in vivo fiber photometry experiments, which demonstrate that VMH NMU⁺ neurons are robustly activated following normal sugar ingestion under physiological conditions. Thus, while supraphysiological glucose was used to establish glucose responsiveness ex vivo, our in vivo data confirm that NMU⁺ neurons respond to glucose elevations within the normal physiological range.

      (8) Figure 5K. The VMH images are inconsistently oriented compared with Figure 5E, lacking a 3v landmark. The NMU detection method (IHC or FISH) is not specified in the legend. The GFP-Calb2 signal is heavily saturated, making it difficult to distinguish true signals from artifacts. These issues undermine interpretability.

      We thank the reviewer for pointing out these issues. In the revised manuscript, VMH images in Figure 5K have been reoriented to match Figure 5E, and the third ventricle (3v) is now indicated as an anatomical landmark. The figure legend has been revised to clarify that NMU<sup>+</sup> neurons are identified by GFP expression from a Cre-dependent AAV2/1-DIO-GFP injected into NMU-Cre mice, rather than by NMU immunohistochemistry or FISH. In addition, GFP–Calb2 images have been reprocessed to clearly distinguish true signals from background and imaging artifacts.

      (9) Figure 5L-M. Details of the NMU injection method are absent (route, dose, delivery parameters). The number of animals (n) is also not reported. Furthermore, AUC reduction alone is not sufficient evidence of robust inhibition. To convincingly demonstrate causality, NMU-IRES-Cre mice should be combined with DREADD or optogenetic approaches to directly inhibit NMU neurons and test whether rNST Calb2 activity is reduced.

      We thank the reviewer for these helpful comments. We have revised the manuscript to include all missing methodological details. These details are now clearly described in the Methods section and figure legend.

      We fully acknowledge that cell-type–specific manipulations, such as DREADD or optogenetic inhibition of NMU neurons, would provide more definitive causal evidence. However, our main goal in the mouse experiments was to demonstrate that NMU<sup>+</sup> neurons can directly sense glucose and modulate sweet sensitivity, thereby supporting the evolutionary conservation of the Hugin mechanism identified in Drosophila. Detailed dissection of the downstream circuit architecture and behavioral consequences in mammals is indeed an important direction for future research, but it lies beyond the current study’s primary focus on cross-species conservation.

      (10) In Drosophila, hugin neurons respond selectively to nutritive glucose (Fig. 2H), but whether NMU neurons share this property is unknown. Notably, Calb2 neurons in the rNST respond to the artificial sweetener AceK (Hao Jin et al., 2021, Cell), leaving open whether the NMU-rNST circuit is calorie-dependent or calorie-independent.

      We have added a statement in the Discussion acknowledging this limitation and emphasizing that future work will be needed to test whether the NMU–Calb2 circuit is selectively engaged by metabolically active sugars or also by sweet taste signals independent of caloric value.

      Minor comments

      (11) All bar graphs should include individual data points.

      We have added individual data points to all bar graphs.

      (12) In Figures 3E, 4C, and 4D, it appears that a combination of GAL4 and LexA was used, but the information about the fly lines is missing.

      We have now included the complete list of fly lines used for these experiments, including their genotypes and sources.

      (13) The source for PK2-R1 KO, AstA-R1 KO fly lines and NMU-IRES-Cre, Calb2-IRES-Cre mice is missing.

      We have added the complete source information for all genetic lines mentioned.

      (14) Figure 5B-D, This is a sucrose preference test, so why is the y-axis labeled as glucose? Is this an error, or were the values converted to glucose equivalents?

      We thank the reviewer for catching this mistake. The assay shown in Figure 5B–D measured sucrose preference, not glucose preference. The inconsistency resulted from a typographical error in the Methods description. In the revised manuscript, we have corrected this error to clearly state that sucrose was used in the preference test,

      (15) Supplementary Figure 15. The NMU images are of poor quality and should be improved.

      The punctate appearance of NMU signals in Supplementary Figure 15 is not due to poor image quality but rather reflects the physiological distribution of the NMU neuropeptide. As NMU is stored in secretory vesicles within neuronal terminals and somata, its immunostaining typically appears as discrete puncta rather than diffuse cytoplasmic labeling.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.<br /> Readers would also benefit from noting that the mice were male and discussion of the exclusion of females.

      In the revised manuscript, we have included full statistical reporting for all key experiments in the resource data. Regarding animal sex, we confirm that all mouse experiments were conducted using male mice. This choice was made to minimize variability caused by hormonal cycles in females, which can influence feeding behavior and glucose metabolism. We have now explicitly stated this information in the Methods section and included a brief discussion noting that sex-specific differences in NMU–Calb2 circuitry and feeding regulation represent an important question for future investigation.

    1. eLife Assessment

      This study introduces a novel method for estimating spatial spectra from irregularly sampled intracranial EEG data, revealing cortical activity across all spatial frequencies, which supports the global and integrated nature of cortical dynamics. It showcases important technical innovations and rigorous analyses, including tests to rule out potential confounds. However, further direct evaluation of the model, for example by using simulated cortical activity with a known spatial spectrum (e.g., an iEEG volume-conductor model that describes the mapping from cortical current source density to iEEG signals, and that incorporates the reference electrodes and the particular montage used), would even further strengthen the solid evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase. The application to data illustrates the solidity of the method and their potential for discovery.

      Comments on revisions:

      The authors have provided responses to the previous recommendations. The paper does not seem to contain further significant improvements. I am thus not inclined to change my judgement.

    3. Reviewer #3 (Public review):

      Summary:

      The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).

      Strengths:

      A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.

      Weaknesses:

      Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.

      Comments on revisions:

      I would like to clarify two points:

      (1) In their response, the authors frame the role of simulations primarily as a means of assessing the effects of volume conduction. However, the purpose of evaluating a proposed estimation method through simulations extends beyond this specific issue. More generally, simulations are essential for establishing that the proposed method-particularly given the multiple non-trivial transformations applied to the observed data-produces accurate and reliable estimates under controlled conditions.

      (2) The authors seem to interpret my use of the term current source density as referring to the current source density (CSD) method, which is an approach to mitigating volume conduction by inverting Poisson's equation. This was not my intention: current source density refers to the physical quantity (i.e., the spatial density of current sources) underlying macroscopic brain activity, and is independent of any specific estimation or inversion technique.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase. The application to data illustrates the solidity of the method and their potential for discovery.

      Comments on revised submission:

      The authors have provided responses to the previous recommendations.

      We thank the reviewer for reviewing our manuscript again, and for their positive evaluation.

      Reviewer #3 (Public review):

      Summary:

      The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).

      Strenghs:

      A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.

      Weaknesses:

      Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.

      Comments on revised version:

      In my original review, I raised the following issue:

      "The proposed method of estimating wavelength from irregularly sampled three-dimensional iEEG data involves several steps (phase-extraction, singular value-decomposition, triangle definition, dimension reduction, etc.) and it is not at all clear that the concatenation of all these steps actually yields accurate estimates. Did the authors use more realistic simulations of cortical activity (i.e. on the convoluted cortical sheet) to verify that the method indeed yields accurate estimates of phase spectra?"

      And the authors' response was:

      "We now included detailed surrogate testing, in which varying combinations of sEEG phase data and veridical surrogate wavelengths are added together. See our reply from the public reviewer comments. We assess that real neurophysiological data (here, sEEG plus surrogate and MEG manipulated in various ways) is a more accurate way to address these issues. In our experience, large scale TWs appear spontaneously in realistic cortical simulations, and we now cite the relevant papers in the manuscript (line 53)."

      The point that I wanted to make is not that traveling waves appear in computational models of cortical activity, as the authors seem to think. My point was that the only direct way to evaluate the proposed method for estimating spatial spectra is to use simulated cortical activity with known spatial spectrum. In particular, with "realistic simulations" I refer to the iEEG volume-conductor model that describes the mapping from cortical current source density (CSD) to iEEG signals, and that incorporates the reference electrodes and the particular montage used.

      Although in the revised manuscript the authors have provided indirect evidence for the soundness of the proposed estimation method, the lack of a direct evaluation using realistic simulations with ground truth as described above makes that remain sceptical about the soundness of the method.

      We thank the reviewer for reviewing our manuscript again.

      We have reviewed the literature again on volume conduction effects in LFP measures of cortical activity. In all publications we reviewed, the conclusion is that the range of the effect is <1cm. We now mention the range of volume conduction in the Methods section dealing with the surrogate models (lines 1054-9) as well as added emphasis in the Discussion (lines 594-9).

      The highest spatial frequency we consider in the present research is 50c/m, which corresponds to a cortical distance of 2cm. This is well outside the range of volume conduction effects in LFPs. Mathematically speaking, blurring (e.g. Gaussian) acts as a low-pass filter, attenuating higher spatial frequency components. But only for components within the spatial range of the Gaussian blurring i.e. for LFPs, higher than 100c/m. There will therefore be negligible effects (mathematically speaking, zero effect) of volume conduction in the results reported by us. If the veracity of these studies on volume conduction with LFPs is accepted, then the reviewer’s requested simulation reduces to “estimating spatial spectra [using] simulated cortical activity with known spatial spectrum.” This is what we have done, in a direct and simple manner.

      If the ubiquity and importance of spatio-temporal dynamics in cortex is accepted, then it is insufficient to describe “the mapping from cortical current source density (CSD) to iEEG signals”, since this presumes a model of cortical activity that does not capture the correlations in space and time that we assume are critical to cortical function. We are aware the CSD approach has a long and successful history of unravelling brain mechanisms. However, an emphasis on traveling waves (and spatio-temporal dynamics in general) is in part a challenge to this approach (and the idea of localized sources in general). CSD approaches carry similar assumptions (but at a smaller scale, <1cm) as those elaborated in Zhigalov and Jensen (2023) for extra-cranial measures. In both cases, removal of volume conduction effects emphasizes standing wave activity (localized static, oscillatory sources) over traveling wave activity. In this manner, these methods tend to confirm their starting assumptions (as does our own approach, of course). What is required is external empirical validation to break any circular confirmation of initial theoretical choice of basis. All this is a way of saying that CSD approaches are not the unproblematic, direct methods that the reviewer asserts.

      We did understand the reviewer’s request to model the effects of volume conduction. Our own view of realistic cortical simulations differs from the reviewer’s, setting aside the final step in the forward modeling pipeline which would add the effects of volume conduction in the grey matter. By simulating real-time dynamics, it should be possible to untangle the effects of volume conduction from true spatio-temporal correlations. This is because the volume conduction effects are essentially instantaneous, compared to the relatively slow motion of traveling waves. So, the measurement of purely spatial phase vectors is prone to smearing artefact, but following the trajectory of a wave over one cycle can more accurately determine the range of true interactions. One could, for example, compare the usual CSD forward modelling with TWs in simulations, see which is the best predictor of future activity, and compare these to empirical measurements. Here, the CSD analysis would remove the volume conduction effects but also emphasize standing activity over motion, even where the motion was veridical in the simulation.

      Even so, these tests are only relevant in <1cm range.

      Another issue is ephaptic coupling, which we mention in the discussion. This means that some of the local volume conduction effects are not merely artefacts from the point of view of cortical function, but have a real causal effect. The strength of the word ‘some’ has yet to be completely resolved in the literature, and it would be technically challenging to include these effects in any simulation.

      Finally, simulation should be an adjunct to empirical studies, or used when empirical studies are not possible. We do not think, in this case, they are the ‘only direct’ way to evaluate our method. We, rather, rely on the converging evidence from empirical studies of volume conduction in LFPs which show this effect is outside the range of our reported results.

    1. eLife Assessment

      In this important work, the authors present a new transformer-based neural network designed to isolate and quantify higher-order epistasis in protein sequences. They provide solid evidence that higher-order epistasis can play key roles in protein function. This work will be of interest to the communities interested in modeling biological sequence data and understanding mutational effects.

    2. Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets they quantify the contribution of higher order epistasis, showing it varies quite extensively.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      Comments for the revision:

      I want to thank the authors for their efforts in revising the manuscript. Most of the concerns raised in the initial review have been adequately addressed.

      However, one important issue remains. I previously asked the authors to benchmark their method against stronger baselines. The authors declined, arguing that these alternatives are "not directly applicable to the types of analyses." I am not persuaded by this rationale. In my view, these baseline methods target essentially the same underlying problem, and at least some, if not all, should be included in a comparative evaluation (or the manuscript should provide a clearer, more technically grounded explanation of why such comparisons are not feasible or not meaningful).

    4. Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set, and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the phenotype of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4 or more amino acids. The novel network architecture achieves such a level of interpretability without noticeable performance penalty. The study of 10 different protein families shows that there is variation among protein families in the importance of these interactions, and that higher order interactions are particularly important to predict the phenotypes of distant proteins.

      Weaknesses:

      The Github repository provides a README file to run a standard pipeline, but a user will need to go through the code to actually know what that pipeline is doing.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets, they quantify the contribution of higher-order epistasis, showing that it varies quite extensively.

      Suggestions:

      (1) The approach taken is very interesting, but it is not particularly well placed in the context of recent related work. MAVE-NN, LANTERN, and MoCHI are all approaches that different labs have developed for inferring and fitting global epistasis functions to DMS datasets. MoCHI can also be used to infer multidimensional global epistasis (for example, folding and binding energies) and also pairwise (and higher order) specific interaction terms (see 10.1186/s13059-024-03444-y and 10.1371/journal.pcbi.1012132). It doesn't distract from the current work to better introduce these recent approaches in the introduction. A comparison of the different capabilities of the methods may also be helpful. It may also be interesting to compare the contributions to variance of 1st, 2nd, and higher-order interaction terms estimated by the Epistatic transformer and MoCHI.

      We thank the reviewer for the very thoughtful suggestion.

      Although these methods are conceptually related to our method, none of them can be realistically used to perform the type of inference we have done in the paper on most the datasets we used, as they all require explicitly enumerating the large number of interaction terms.

      We have included new text (Line 65-74) in the introduction to discuss the advantages and disadvantages of these models. We believe this has made our contribution better placed in the broader context of the field.

      (2) https://doi.org/10.1371/journal.pcbi.1004771 is another useful reference that relates different metrics of epistasis, including the useful distinction between biochemical/background-relative and backgroundaveraged epistasis.

      We have included this very relevant reference in the introduction. We also pointed out the limitation of these class of methods is that they typically require near combinatorically complete datasets and often have to rely on regularized regression to infer the parameters, making the inferred model parameters disconnected from their theoretical expectations. Line 49-56.

      (3) Which higher-order interactions are more important? Are there any mechanistic/structural insights?

      We thank the reviewer for pointing out this potential improvement. We have now included a detailed analysis of the GRB2-SH3 abundance landscape in the final section of the results. In particular, we estimated the contribution of individual amino acid sites to different orders (pairwise, 3-4th order, 4-8th order) of epistasis and discuss our finding in the context of the 3D structure of this domain. We also analyzed the sparsity of specific interactions among subsets of sites.

      Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      We thank the reviewer for the positive feedback.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions." There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled nonadditive interaction discovery in machine learning models."

      We thank the reviewer for this very helpful comment. These references are indeed conceptually quite similar to our framework. Although they are not directly applicable to the types of analyses we performed in this paper (partitioning contribution of epistasis into different interaction orders in terms of variance components), we have included a discussion of these methods in the introduction (Line 70-74). We believe this helps better situate our method within the broader conceptual context of interpreting machine learning models for epistatic interactions.

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      Again, we thank the reviewer for the thoughtful comment. We have addressed this comment together with a related comment by Reviewer1 by including a detailed analysis of the GRB2-SH3 landscape using a marginal epistasis framework, where we quantified the contribution of individual sites to different orders of epistasis as well as the sparsity of epistatic interactions. We also present these results in the context of the structure of this protein. Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      We agree that the under parameterization of the simple sigmoid function could be be potentially confounding. We did compare different choices of functional forms for modeling global epistasis. Overall, we found that there is no difference between a simple sigmoid function with four trainable parameters and the more complex version (sum of multiple sigmoid functions, used by popular methods such as MAVENN). Therefore, all results we presented in the paper were based on the model with a single scalable sigmoid function.

      We have added relevant text; line 153-158. We have also included side-by-side comparisons of the model performance for the GRB-abundance and the AAV2 dataset to corroborate this claim (Supplemental Figure 1).

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      We thank the reviewer for the thoughtful suggestion. We have rewritten the description of our metrics for measuring the importance of "pairwise", "3-4-way", and ">4-way" interactions; Line 232-239.

      We have also added a table to improve clarity, as suggested; Table 2.

      Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the function of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4, or more amino acids. The study of 10 different protein families shows that there is variation among protein families.

      Weaknesses:

      The manuscript is good overall, but could have gone a bit deeper by comparing the new architecture to standard transformers, and by investigating whether differences between protein families explain some of the differences in the importance of interactions between amino acids. Finally, the GitHub repository needs some more information to be usable.

      We thank the reviewer for the thoughtful comments. We have listed our response below in the “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some of the dataset labels are confusing. For example, GRB is actually the protein GRB2 and more specifically just one of the two SH3 domains from GRB2 (called GRB2-SH3 in Faure et al.).

      We thank the reviewer for catching this. Our original naming of the datasets followed the designation of library number in the Faure et al paper (which constructed 3 variant libraries and performed different assays on them). To avoid confusion (and also save space in the figure titles), we have now renamed the datasets using this mapping:

      Author response table 1.

      Reviewer #3 (Recommendations for the authors):

      (1) What is the cost of the interpretability of the model? It would be interesting to evaluate how a standard transformer, complete with its many non-linearities, performs on the simulated 13-position data, using the r2 metric. This is important as the last sentence of the discussion seems to suggest that the model proposed by the authors could be used in other contexts, where perhaps interpretability would be less important.

      We thank the reviewer for this suggestion. We have run a generic transformer model on the GRBabundance and AAV2 datasets. Overall, we found minimal difference between the generic model and our interpretable model, suggesting that fitting the interpretable transformer does not incur significant cost in performance.

      We have included a side-by-side comparison of the performance of the generic transformer and our three-layer model in Supplemental Figure 5 and a discussion of this finding in Line 256-259.

      (2) The 10 data sets analyzed by the authors differ in their behaviour. I was wondering whether the proteins have different characteristics, beyond the number and distribution of mutants in the data sets. For instance, do high-order interactions play a bigger role in longer proteins, in proteins with more secondary structures, in more hydrophobic proteins?

      We fully agree that this is a highly relevant question. Unfortunately, the paucity of datasets suitable for the type of analyses we performed in the paper limit our ability to draw general conclusions. Furthermore, the differences in genotype distribution among the 10 datasets may be the main driving factor in the behaviors of the models.

      We included our thoughts on this issue in the discussion (Line 477-481).

      We will definitely revisit this question if this type of high-order combinatorial DMS data becomes more available in the (hopefully) near future.

      (3) Although the code appears to be available in the repository, there is no information about the content of the different folders, about what the different scripts do, or about how to reproduce the article's results. More work should be done to clarify it all.

      Thank you for pointing this out. We have substantially improved our github repository and included many annotations for reproducibility.

      (4) Typos and minor comments:

      (a) p3 "a multi-peak fitness landscapes": landscape.

      (b) p3 "Here instead of directly fitting the the regression coefficients in Eq. 2": remove 'the'.

      (c) p3 "neural network architectures do not allow us to control the highest order of specific epistasis": a word is missing.

      (d) p6 "up to 1,926, 3,014, and 4,102 parameters, respectively-all smaller than the size of the training dataset": it's not very clear what size of the dataset means: number of example sequences?

      (e) p6 "This results confirm": This result confirms.

      (f) p6 "to the convergence of of the variance components of the model landscape to the ground truth.": remove 'of'.

      (g) p7 "to characterize the importance higher-order interactions": the importance of.

      (h) p7 "The improvement varies across datasets and range": and ranges.

      (i) p9 "over the pairwise model is due to the its ability": remove 'the'.

      (j) p13 "This results suggest that pairwise": result suggests.

      (k) p13 "although the role assessed by prediction for randomly sampled genotypes seems moderate": sampled. Also, I'm not sure I understand this part of the sentence: what results are used to support this claim? It's not 6b, which is only based on the mutational model.

      This is in Supplemental Figure 7.

      (l) p13 "potentially by modeling how the these local effects": remove the.

      (m) p13 "We first note that the the higher-order models": remove the.

      (n) p15 "M layers of MHA leads to a models that strictly": lead to a model.

      (o) Supp Figure 1: "Solid lines shows the inverse": show.

      (p) Supp p 10 "on 90% of randomly sample data": sampled.

      (q) Supp p11 "Next, assume that Eq. 5 is true for m > 0. We need to show that Eq. 5 is also true for m + 1.": shouldn't it be m>=0 ? It seems important to start the recursive argument.

      Good catch.

      (r) Supp p11 "Since the sum in line 9 run through subsets": runs.

      (s) Supp p11 "we can further simplify Eq. 11 it to": remove it.

      We have fixed all these problems. We very much appreciate the reviewer’s attention.

    1. eLife Assessment

      This study uses the yeast two-hybrid assay to identify proteins that may interact with yeast Set1 and other subunits of COMPASS/Set1C, the histone H3K4 methyltransferase, providing also some evidence for Set1 sumoylation and a role of SET1C methylating other factors in vitro. The results are valuable, and they should contribute to understanding the functions of the conserved SET1C complex, as they suggest potential functional connections with RNA biogenesis, chromatin remodeling, and non-histone methylation, whose implications would yet need to be explored. Nevertheless, apart from the fact that only a small subset of the Y2H interactions is further examined, the validating experiments are only partial or inconclusive, the strength of evidence being at this point incomplete.

    2. Reviewer #1 (Public review):

      The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.

      The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.

      (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.

      (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state, "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.

      The figure legends for Figure S2B and C (copied here in bold) do not correspond to the figure.

      B - Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.

      C - Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.

      Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.

      Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.

      This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence. To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.

      (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.

      (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.

      (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences. This line of thinking is even less connected to the Y2H screens than the sumoylation work.

      However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).

      (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.

      (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.

      The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The overinterpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.

    3. Reviewer #2 (Public review):

      Summary:

      This paper starts with a large-scale yeast two-hybrid (Y2H) screen using Set1 (full-length and smaller parts) and other Set1C/COMPASS subunits as bait. There are hundreds of possible interactions identified, but only a small number are given any follow-up. While it's useful to document all the possible interactions, the unfocused and preliminary nature of the results makes the paper feel scattered and incomplete.

      Strengths:

      The Y2H screen was very comprehensive, producing lots of interesting possible leads for further experiments.

      Weaknesses:

      The results are useful but incomplete because only a small subset of the Y2H interactions is further examined. Even in the case of those that were further tested, the validating experiments are only partial or inconclusive.

    4. Reviewer #3 (Public review):

      The SET1C/COMPASS complex is the histone H3K4 methyltransferase in Saccharomyces cerevisiae, where it plays pivotal roles in transcriptional regulation, DNA repair, and chromatin dynamics. While its canonical function in histone methylation is well-established, its full interactome remains poorly defined. Moreover, whether SET1C methylates non-histone substrates has been an open question.

      In this study, Luciano et al. employ systematic yeast two-hybrid (Y2H) screening to uncover novel interactors and functions of SET1C. Their findings reveal potential functional connections to RNA biogenesis, chromatin remodeling, and non-histone methylation.

      The authors performed multiple Y2H screens using Set1 (full-length, N-terminal, and C-terminal fragments) and each of its seven subunits as baits. They identified high-confidence interactors that link SET1C to diverse cellular processes, including chromatin regulation (e.g., the SWI/SNF complex via Snf2), DNA replication (e.g., Mcm2, Orc6), RNA biogenesis (e.g., spliceosome components Prp8 and Prp22; polyadenylation factors Pta1 and Ref2), tRNA processing (e.g., Trm1, Trm732), and nuclear import/export (e.g., importins Kap104 and Kap123). Some of these interactions were further validated by immunoprecipitation or in vitro assays.

      Given the interaction of Set1 with Slx5 and Wss1 - proteins involved in SUMO-dependent processes - the authors investigated and convincingly demonstrated that Set1 is sumoylated. This modification may influence the function and regulation of the SET1C complex.

      Finally, the authors provide evidence that SET1C methylates proteins beyond histone H3K4, notably Nrm1, a transcriptional corepressor, and Snf2, the catalytic subunit of the SWI/SNF chromatin remodeling complex. Although Nrm1 contains a domain resembling the H3K4-methylated sequence (H3K4-like domain), this region does not appear to be required for its methylation. The search for other proteins containing similar domains as potential methylation candidates (p.12, first paragraph) seems less justified, given the lack of evidence supporting the requirement for the H3K4-like domain in methylation.

      This study offers valuable insights into the interactome of SET1C, suggesting potential links between the complex and a wide range of cellular processes. However, the functional implications of the Y2H interactions remain to be explored further. Additionally, the study provides intriguing information on the possible regulation of Set1 by sumoylation. The discovery of Nrm1 and Snf2 as methylation substrates could significantly expand the known targets and functions of SET1C.

      The results are supported by high-quality data.

    5. Author response:

      eLife Assessment

      This study uses the yeast two-hybrid assay to identify proteins that may interact with yeast Set1 and other subunits of COMPASS/Set1C, the histone H3K4 methyltransferase, providing also some evidence for Set1 sumoylation and a role of SET1C methylating other factors in vitro. The results are valuable, and they should contribute to understanding the functions of the conserved SET1C complex, as they suggest potential functional connections with RNA biogenesis, chromatin remodeling, and non-histone methylation, whose implications would yet need to be explored. Nevertheless, apart from the fact that only a small subset of the Y2H interactions is further examined, the validating experiments are only partial or inconclusive, the strength of evidence being at this point incomplete.

      We thank the reviewers for their thoughtful comments, which primarily raise three major concerns: the overinterpretation of the Y2H data, issues related to validation, and the manuscript’s structure. At the same time, the reviewers acknowledge that the dataset is extensive and that aspects of the validation work are valuable. Below, we provide point-by-point responses to the public reviews. We will prepare a revised version of the manuscript that carefully addresses the public comments and incorporates the referees’ recommendations.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.

      The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.

      We agree that false positives contaminate the list of potential interactors. Some interactions may also be indirect through a common interactor and do not reflect a physiological interaction. Nevertheless, some positives reflect real interactions that can occur under specific physiological conditions. This is the case, for example, with the interaction between Spp1 and Mer2 (from this screen), which has led to major discoveries (Acquaviva et al. Science 2013; Sommermeyer et al. Mol Cell 2013). The publication of these 10 screens should be viewed as a valuable resource for the broader community.

      Hybrigenics brings extensive experience from conducting numerous screens, enabling the team to recognize recurring false positives that commonly arise in screening assays.

      (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.

      We agree with referee 1 that more caution is needed, and we will take this into account in the revised version. We agree that Y2H interaction is an indication of potential interaction and not proof of interaction. We have therefore made a significant effort to compile elements from the literature that may support the interaction. Once again, this study can be considered a resource.

      (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state, "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.

      When we say that we have refined the interaction region between Set1, Prp8, and Prp22, we mean that we have restricted the interaction regions according to Y2H criteria. Indeed, we have not shown the spots illustrating the results. This will be corrected in the revised version.

      The figure legends for Figure S2B and C (copied here in bold) do not correspond to the figure.

      We agree that the legend for Figure S2 is unclear and does not accurately describe the panels shown in the figure. We will revise the legend accordingly in the updated version to ensure it accurately reflects the content of all panels.

      (B) Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.

      (C) Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.

      Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.

      Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.

      This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence.

      This statement is supported by our previous findings demonstrating that Set1 binds Ty1 mRNA independently of it dRRM and represses Ty1 mobility at a post-transcriptional stage (Luciano et al., Cell Discovery, 2017 PMID:29071121). Binding of Set1 to Ty1 mRNA could stem from the interaction between Set1 1-754 and the Gag capsid-like protein.

      To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.

      While we agree that the Nis1-Set1C interaction has not been demonstrated beyond doubt, we feel that our Y2H and in vitro binding experiments provide reasonable evidence that the interactions may be relevant. It is important to consider that any interaction assay can provide negative (and false positive) results, this includes Y2H, in vitro binding and mass-spec analysis of purified complexes from cells. We feel that it is not appropriate to only trust protein interactions that are strong and stable enough to be demonstrated via purified complexes. It is clear that some protein interactions do occur in transient and weak manner and therefore are not compatible with biochemical purification approach. This indeed is the strength of alternative methods like Y2H and in vitro binding assays, that interactions can be identified and tested even if the physiological context of the interaction may be more complex.

      (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.

      We thank Referee 1 for their comment. While it is true that there is only a modest connection between Set1 interactors involved in direct or indirect sumoylation and the characterization of Set1 SUMOylation sites, we believe that this does not constitute a weakness of the manuscript.

      (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.

      According to this comment, we will remove Fig. 7 panels A-D.

      (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences.

      This line of thinking is even less connected to the Y2H screens than the sumoylation work.

      However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).

      (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.

      We will take into account of these remarks (points 5 and 6) in the revised version.

      (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.

      We respectfully disagree with referee 1. We have integrated extensive in vitro reconstruction experiments with complementary in vivo studies, all conducted according to the rigorous standards expected by leading journals. These approaches have allowed us to reach the conclusions presented in this manuscript. While some of these findings are unexpected, they are supported by the data. We have carefully discussed the results and their limitations to provide a comprehensive interpretation.

      The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The overinterpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.

      We thank Referee 1 for these insightful comments. In the revised version, we will follow the advice to remove non-informative failures and peripheral distractions. Additionally, we will exercise greater caution to avoid overinterpreting the Y2H results.

      Reviewer #2 (Public review):

      Summary:

      This paper starts with a large-scale yeast two-hybrid (Y2H) screen using Set1 (full-length and smaller parts) and other Set1C/COMPASS subunits as bait. There are hundreds of possible interactions identified, but only a small number are given any follow-up. While it's useful to document all the possible interactions, the unfocused and preliminary nature of the results makes the paper feel scattered and incomplete.

      Strengths:

      The Y2H screen was very comprehensive, producing lots of interesting possible leads for further experiments.

      Weaknesses:

      The results are useful but incomplete because only a small subset of the Y2H interactions is further examined. Even in the case of those that were further tested, the validating experiments are only partial or inconclusive.

      Referee 2’s comments align in some respects with those of Referee 1. We will follow the detailed Referee 2 suggestions to reduce the scattered nature of the manuscript.

      We will follow his/her recommendations, in particular we will provide and AlphaFold model of the interaction between the Set1 N-term 1-754 with the SID domain of Kap104 that involves the proposed Set1 PY-NLS sequence.

      Reviewer #3 (Public review):

      The SET1C/COMPASS complex is the histone H3K4 methyltransferase in Saccharomyces cerevisiae, where it plays pivotal roles in transcriptional regulation, DNA repair, and chromatin dynamics. While its canonical function in histone methylation is well-established, its full interactome remains poorly defined. Moreover, whether SET1C methylates non-histone substrates has been an open question. In this study, Luciano et al. employ systematic yeast two-hybrid (Y2H) screening to uncover novel interactors and functions of SET1C. Their findings reveal potential functional connections to RNA biogenesis, chromatin remodeling, and non-histone methylation.

      The authors performed multiple Y2H screens using Set1 (full-length, N-terminal, and C-terminal fragments) and each of its seven subunits as baits. They identified high-confidence interactors that link SET1C to diverse cellular processes, including chromatin regulation (e.g., the SWI/SNF complex via Snf2), DNA replication (e.g., Mcm2, Orc6), RNA biogenesis (e.g., spliceosome components Prp8 and Prp22; polyadenylation factors Pta1 and Ref2), tRNA processing (e.g., Trm1, Trm732), and nuclear import/export (e.g., importins Kap104 and Kap123). Some of these interactions were further validated by immunoprecipitation or in vitro assays.

      Given the interaction of Set1 with Slx5 and Wss1 - proteins involved in SUMO-dependent processes - the authors investigated and convincingly demonstrated that Set1 is sumoylated. This modification may influence the function and regulation of the SET1C complex.

      Finally, the authors provide evidence that SET1C methylates proteins beyond histone H3K4, notably Nrm1, a transcriptional corepressor, and Snf2, the catalytic subunit of the SWI/SNF chromatin remodeling complex. Although Nrm1 contains a domain resembling the H3K4-methylated sequence (H3K4-like domain), this region does not appear to be required for its methylation. The search for other proteins containing similar domains as potential methylation candidates (p.12, first paragraph) seems less justified, given the lack of evidence supporting the requirement for the H3K4-like domain in methylation.

      This study offers valuable insights into the interactome of SET1C, suggesting potential links between the complex and a wide range of cellular processes. However, the functional implications of the Y2H interactions remain to be explored further. Additionally, the study provides intriguing information on the possible regulation of Set1 by sumoylation. The discovery of Nrm1 and Snf2 as methylation substrates could significantly expand the known targets and functions of SET1C.

      The results are supported by high-quality data.

      We thank referee 3 for his/her positive comments

    1. eLife Assessment

      This study presents valuable findings for identifying biotypes of depression patients using white matter measures, which are under-utilised and under-appreciated in current biological and computational psychiatry work. The evidence supporting the claims is solid, although enhanced interpretability of the identified biotypes across both white matter and symptom levels, and better justification of the choice of models would strengthen the paper. Overall, this study will be of interest to the broad community of neuroimagers, clinicians, and biological and computational psychiatry researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This work stratifies depression subgroups based on white matter integrity (Fractional Anisotropy, FA) and evaluates the relationship between white matter (WM) alterations in these subgroups and clinical symptoms. Furthermore, the authors tested these subgroup findings in an independent cohort. This paper provides WM-based depression subtypes that are linked to the clinical symptom profile (anxiety, cognitive, hopelessness, sleep, and psychomotor retardation) and presents the prediction of treatment outcome using these subtypes.

      Strengths:

      Applying a novel NMF (Non-negative Matrix Factorization) biclustering approach to stratify depression subtypes using white matter integrity. Following the recent functional MRI-based depression subtype stratification, this work provides a structural signature for depression heterogeneity. These subtypes were also tested in an independent cohort, with findings regarding clinical symptom profiles.

      Weaknesses:

      Although this novel method successfully subgroups depression patients, it is difficult to understand the spatial patterns of WM alteration and which structural connections, such as DMN, SN, ECN, and Limbic, because the findings are distributed across multiple WM bundles in each subgroup. Furthermore, these subtypes fail to predict optimal treatment selection within each group, since all subgroups benefit from different treatments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors measure the directional consistency of water diffusion in white matter (functional anisotropy: FA) to stratify depression subtypes across young adults. These findings are significant in that they highlight white matter as an underappreciated aspect of neural heterogeneity in major depressive disorder. While the evidence for meaningful, lower-dimensional structure in depression heterogeneity within their Nanjing cohorts is strong, claims that their subtypes are characterized by specific clinical symptom profiles and reflect neuroplasticity reserve are not supported by the same strength of evidence.

      Strengths:

      Circumscribing analyses to a simple white matter measure, across a sparse skeleton, with explicit sparsity-promoting algorithms yielded heterogeneity subdivisions that are much more interpretable than most depression heterogeneity clustering papers. Replication of their 3-cluster solution in an external dataset bolsters confidence in the existence of these 3 clusters, although generalizability to more diverse populations remains untested. The authors also tested a wide variety of treatment outcomes, which is difficult data to aggregate but ultimately critical for validating the utility of depression subtypes.

      Weaknesses:

      sCCA and SVR results were less interpretable. In part, this is due to core features of these methods (broad distribution of weights, instability across iterations). However, these inherent components of sCCA and SVR opacity were exacerbated by the opacity surrounding several analytic choices made by the authors and intermediate results associated with them. Without more transparency, it's unclear how these results extend the neuroclinical differentiation established (or not established) by their original NMF analyses.

      To be more specific, a central claim of the paper is that their biotypes are "pathophysiologically distinct" and demonstrate "symptom-specific neurobiological substrates". However, only 3/18 pairwise symptom differences generalize across both datasets (Figures 1 and 2), implying that these biotypes have more symptom overlap than distinction. Brain-based distinctions are real and replicable, but because their NMF approach specifically optimizes for separating clusters on the basis of brain features, this is more of a methodological validation than a scientific finding. While several brain-symptom relationships reported later using sCCA and SVR are interesting, it is not currently possible to evaluate the robustness of these relationships and whether or not these relationships are nested within NMF-derived clusters or exist regardless of subtype.

      To be clear, the heterogeneity problem in depression is extremely difficult to solve and beyond the scope of this manuscript. Despite the scale of this problem, the authors do report tangible progress in this aim, largely through finding an interpretable set of white matter features distinguishing patient clusters. These findings may lead researchers to meaningfully incorporate white matter features into heterogeneity analyses more in the future. However, many of the claims made are not fully supported, particularly surrounding clinical specificity and neuroplasticity reserve.

    4. Author response:

      We sincerely appreciate the constructive comments and valuable suggestions from the editors sand reviewers. We highly value the feedback and will carefully address all concerns in our revised manuscript.

      (1) We will supplement more details of the processing steps and key results in the analyses of sCCA and SVR to improve the transparency and reproducibility of our methods.

      (2) According to the reviewers’ suggestions, we will adjust and present a more conventional and cautious conclusion regarding clinical specificity and neuroplasticity reserve.

      (3) We will supplement the results of structural connections (termed “symptom-related network” in the manuscript) across the three subgroups to strengthen the interpretation of subgroup-specific neurobiological characteristics.

      (4) All the suggestions from the reviews will be respected, and we will carefully revise our manuscript to improve its clarity, rigor, and scientific quality.

      We believe these revisions will significantly improve the quality of our work.

    1. eLife Assessment

      This useful study analyzes demographic history and selection using whole-genome sequencing data from 40 Faroese individuals, generating results of value beyond the study region. The analyses are convincing, and revisions have satisfactorily addressed prior concerns, including clarification of selection analyses and expanded discussion of population structure and admixture timing. While a more fine-scale reconstruction of demographic history could still yield more insights, and access restrictions on individual-level data continue to limit broader reuse, the provision of summary statistics partially mitigates this constraint.

    2. Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations although the authors identify signals that may be specific to the Faroes. Finally they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality controlled and appear to be high quality. Some aspects of Faroese population history are characterized - in particular, the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared, however, that does substantially reduce the value of the paper. I appreciate the authors sharing summary statistics for the selection scan.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e. they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, What are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it a bit of a missed opportunity not to address these questions.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "Observational evidence from the FarGen project recruitment data suggest that ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands". But nothing beyond that. So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies? This section seems preliminary.

    3. Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      Comments on the revision:

      I appreciate the authors' detailed and thoughtful response to my review. They have addressed all my concerns to my satisfaction and I have no additional comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we have addressed each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we have included summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we now report both the empirical selection statistics and the corresponding converted p-values in either the main text or supplement, and both outputs are also provided in the full summary files. This dual approach will allow readers to fully interpret the results under both perspectives.

      Expanded discussion of admixture timing and population structure: We have carefully considered the reviewers' suggestions to incorporate additional descriptions of population structure or demographic analyses, and have done so in our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so we have now made the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed in our Data Availability statement.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S6 and S7. To better highlight this result, we now also include the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      We agree that more fine-scale demographic analyses would be informative. We now additionally provide an estimation of the admixture date in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes and discussion using the DATES software which is optimized for ancient genomes.

      We have encountered problems with using different standard date estimation software, including DATES, which give very inconsistent and unstable results. As we note in our text, we suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands, low LD differentiation between the source populations, or multiple pulses of admixture, which may be breaking one or more of the assumptions of these methods. Assessing the limitations of these methods is beyond the scope of this current manuscript; however, we will continue working on this problem for future studies, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In revision, we clarify in the Main Text - Results - HLA-B Allele Frequencies and Discussion that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions were not originally specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S6 and S7. As stated in our response to Reviewer #1, in our revisions, we now more clearly state the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis.

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in Main Text Figure 2, we now additionally provide the distribution in ROH lengths across all individuals for each cohort in a new Supplemental Figure S3. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts. As the reviewer points out, these longer ROHs are possibly indicative of a more recent or stronger bottleneck in the Faroes relative to the comparison cohorts. We highlight this result in Main Test - Results - Population Structure and Relatedness.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example, even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale or even earlier, based on the DATES estimates. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (dated to 260 years BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. In the original manuscript, we mentioned this as a likely possibility in the Main Text - Discussion: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” In our revisions, we further included the DATES estimations of the timing of admixture in the modern and historical Faroese samples, which pre-date the timing of settlement in both cases. We highlight these points in the Discussion. And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split (Discussion and Results - Signals of Positive Selection). iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In our revisions, we now clarify in the Discussion the limitations and time-scale at which the iHS statistic may detect selection. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations. We highlight this point in the Discussion.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We originally did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation.

      However, given the reviewer’s comment, we have now included the frequencies as well as these caveats in the Discussion. We additionally calculated the LCT allele frequency in other ancient samples, and assuming that we had good proxies for the sources at the time of admixture, we calculated the expected allele frequency in the admixed ancestors of the Faroese founders (Discussion), but again note the limitations in using such a calculation in this context.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we have additionally supplied both the standardized iHS / XP-EHH values in Supplementary Fig. S10 as well as these values transformed to p-values in Main Text Fig. 3. Additionally, both outputs are provided in the publicly available selection scan results files. We provide the method for obtaining p-values in the subsection “Selection scan” from the Methods section - we used a method developed earlier by Fariello et al.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in our revisions, we have updated the Main Text - Discussion to acknowledge this possibility.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please note that there was disagreement among the reviewers regarding the reporting of outliers.

      As stated in our response to the public reviews, given the disagreement, we include both the empirical selection statistics as well as the converted p-values in the main text, supplement and selection scan files.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2:

      Define labels / explain why they differ from 1000k populations / make them consistent throughout the manuscript.

      We apologize for the error in labels for Figure 2. These are the same populations used in other figures and analyses. We have fixed this in our revisions so that the labels are consistent with the rest of the manuscript.

      (2) Figure S2 label:

      "The matrix is rescaled after subsetting the individuals, so although the scales are different, the overall structure remains the same." I do not understand this sentence. The samples are different, the scale is different, the apparent pattern is different - what overall structure is supposed to be the same?

      We apologize that the language was not clear in the figure label. The scales between panels A and B are different, because popkin rescales the kinship labels after subsetting so that the minimum kinship is zero. This is necessary when subsetting individuals from an already estimated kinship matrix particularly when subsetting from global populations to a single region. From the popkin documentation: “This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent)” (https://rdrr.io/cran/popkin/man/rescale_popkin.html).

      We also described this in the Methods - Population Genetics - Kinship and runs of homozygosity section: “When calculating the kinship matrix for the Faroese WGS cohort only, we used the rescale_kinship() function, which will change the most recent common ancestor and give different absolute values, but the overall relationship structure in the subpopulation remains the same.”

      That is, the relative kinship within the Faroese cohort remains consistent, despite the different scale.

      It is difficult to see the kinship of Faroese individuals in the larger plot with all cohorts, which is why we subset and visualize the Faroese cohort alone. We have updated the Fig. S2 label language to make this more clear.

      (3) "Iron Age Wet Europe"

      We have corrected this typo to “Iron Age West Europe.”

      I'm confused if the ancient Faroese were part of the imputation panel: Figure 5 legend implies they are, methods imply they are not.

      The ancient samples are not imputed with the modern Faroese and reference samples, but they are the imputed data downloaded from Allentoft et al. and merged with the modern Faroese cohort. We specify that we downloaded imputed ancient samples in both the Methods - Fine-scale structure estimation using ancient genomes and in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes. The description of the imputation panel in the Methods - Bioinformatics - Variant calling and imputation refers only to the modern samples.

      (4) Kinship:

      The kinship of the Faroes is useful (and nice) as a QC analysis showing the genetic data matches the expectations from the pedigree. I don't know what I should learn from the kinship of the 1000kg samples (I'd assume one could learn something about bottleneck strength from this), but it's not developed/discussed.

      The global kinship matrix provides complementary information to PCA and ROH, as another way to quantify and visualize the relationships within and between populations. Additionally, as the reviewer mentioned, bottlenecks increase kinship within populations. Given that popkin estimates kinship measured from a Most Recent Common Ancestor, we can best observe this increase in kinship when comparing to other global populations. We more clearly delineate what can be observed from Fig. S2A versus Fig. S2B in the Results - Population Structure and Relatedness.

      Reference

      (1) Gretzinger, J. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022)

      (2) Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    1. eLife Assessment

      This important work provides a fresh perspective on merozoite surface biology and its implications for vaccine design, challenging the prevailing dogma that MSPs are indispensable invasion engines. The revised manuscript strengthens the compelling evidence that, although MSP2 is dispensable for parasite growth, it acts as an immune modulator of AMA1. While the study is commendable for its use of state-of-the-art technologies and the skillful application of monoclonal antibodies, the inclusion of human monoclonal antibodies and electron microscopy imaging approaches would significantly add to the importance of these observations. Overall, this work will be of considerable interest to investigators studying Plasmodium biology and vaccine development.

    2. Reviewer #2 (Public review):

      The major strengths of the manuscript are in the Plasmodium falciparum genetic and phenotyping approaches. PfMSP2 knockouts are made in two different strains, which is important as it is know that invasion pathways can vary between strains, but is a level of comprehensiveness that is not always delivered in P. falciparum genetic studies. The knockout strains are characterised very thoroughly using multiple different assays and the authors should be commended for publishing a good deal of negative data, where no phenotype was detected. This is not always done but is very helpful for the field and reduces the potential for experimental redundancy, i.e. others repeating work that has already been performed but never published. The quality of the writing, referencing and figures is also generally strong.

      There are certainly some areas of the manuscript that would benefit from deeper exploration, such as electron microscopy/other imaging approaches to explore whether deletion of PfMSP2 has a visible impact on merozoite surface structure, further replicates of the video microscopy assays to see whether trends in the data could reach significance (although these are very time-consuming and technically difficult assays), and follow up of some of the genes where expression is changed by PfMSP2 knockout (as the authors point out, there are no candidates that have a very obvious link to invasion suggesting that they may be compensating for PfMSP2 function, although several are expressed in schizont stages). However, there is already a substantial amount of data in the manuscript, and more detailed follow-up is reasonable to leave to future work. Overall, with the modifications made through the review process, including the addition of new controls for key experiments, the claims and conclusions are justified by the data, and the manuscript generates important new information about a highly studied Plasmodium falciparum merozoite surface protein.

    3. Reviewer #3 (Public review):

      Henshall et al. study invasion of human erythrocytes by Plasmodium falciparum merozoites and report knockout of PfMSP2, a critical merozoite surface protein with unknown function. They describe conservation of MSP2 in P. falciparum and key avian malaria parasites, unabated growth of two knockout lines (∆MSP2) produced in divergent 3D7 and Dd2 strains, no differences in expression of key invasion-associated genes, no effect on invasion kinetics (with or without protease treatment of erythrocytes), nonsignificant effects of knockout on parasite growth inhibition by antibodies directed against key invasion-associated antigens, and do find a significant effect on potentiating AMA1 invasion inhibitory antibodies. The studies are interesting and have potential for directing vaccine design targeting erythrocyte invasion, a critical step in bloodstream expansion of malaria parasites.

      Major points:

      (1) Much of the manuscript describes negative results and this reviewer found it arduous to get through many negative or nonsignificant results before finally getting to the significant effect on AMA1 inhibitory antibodies, not presented until Figure 6! Computational studies in Fig. 1 could be a supplementary figure. Figs. 2 and 3. demonstrate knockout in 3D7 and Dd2, respectively and could be assembled into a single figure. (Notably Fig. 2A and 3A are almost identical with use of some different primers.) Fig. 2E, 2F, 3D-H, all of Fig. 4, most of Fig. 5 are all negative or insignificant results that could also be moved to supplementary data. As MSP4, MSP5, and SUB1 are presumably included in the whole genome RNA-seq experiments shown in Fig. 4C, it makes sense to remove Fig. 4A data from the paper fully. These consolidating changes would help highlight the key finding of improved binding and block of AMA1's role in invasion.

      (2) The potentiating effects on anti-AMA1 antibodies are shown with rabbit sera and purified antibodies, mouse monoclonal antibodies, and smaller i-bodies inspired by shark antibody-like receptors but not with human monoclonal antibodies (hmAbs). As naturally acquired hmAbs targeting AMA1 have been identified and characterized (PMIDs: 39632799, 40020675), would it not be important to test these antibodies in the ∆MSP2, especially as the authors emphasize the importance of their model in designing better human malaria vaccines?

      (3) Fig. 7 presents quantitative fluorescence microscopy to measure anti-AMA1 binding and support a model where MSP2 serves to sterically hinder antibody access to AMA1 on individual merozoites. I understand that the negative WD33 control is useful to contrast to the positive WD34 antibody (both bind AMA1 but only WD34 exhibits parasite growth inhibitory effects), but it seems that use of smaller i-bodies rather than conventional larger mouse or ideally human monoclonal antibodies may compromise demonstration of steric hindrance by MSP2 because smaller i-bodies may be less hinder.

      (4) Some explanation for why WD33 fails to inhibit growth despite targeting the same antigen as WD34 is needed. Are the epitopes known? Does one bind further from the RON2 binding pocket?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Henshall et al. delete the highly abundant merozoite surface protein PfMSP2 from two Plasmodium falciparum laboratory lines (3D7 and Dd2) using CRISPR-Cas9. Parasites lacking MSP2 replicate and invade red cells normally, opposing the experimental history that suggests MSP2 is essential. Unexpectedly, the knock-outs become more susceptible to several inhibitory antibodies - most strikingly those that target the apical antigen AMA1-while antibodies to other surface or secreted proteins are largely unaffected. Recombinant MSP2 added in vitro can dampen AMA1-antibody binding, supporting a "conformational masking" model. The reported data suggest that MSP2 helps shield key invasion ligands from host antibodies and may itself be a double-edged vaccine target.

      Reviewer 1 did not have any comments we needed to address.

      Reviewer #2 (Public review):

      (1) The section describing Laverania and avian Plasmodium MSP2 comparison is a lengthy section and could be told much more concisely for clarity in delivering the key message, i.e., that conservation in distantly related Plasmodium species could indicate an important function. The identification of MSP2-like genes in avian Plasmodium species was highlighted previously in the referenced Escalante paper, so it is not entirely novel, although this paper goes into more detailed characterisation of the extent of conservation. Overall, this section takes up much more space in the manuscript than is merited by the novelty and significance of the findings.

      As outlined in point (1) for Reviewer 1 (Recommendations for the authors), we have cut back through this section and focussed on the important comparisons rather than the general observation. We have also moved the elements of Table 1 to Supplementary Figures 2, 3 and 4 to streamline the manuscript. Further description of the changes is available in the Reviewer #1 (Recommendations for the authors).

      (2) Characterisation of the knockout strains is generally thorough, though relatively few interactions were followed by live microscopy (Figures 3E-H). A minimum of 30 merozoites were followed in each assay (although the precise number is not specified in the figure or legend), but there are intriguing trends in the data that could potentially have become significant if n was increased.

      In the Figure 3 Legend we have now indicated the number of merozoite invasions followed as per the following:

      “(E-H) Key parameters of merozoite invasion were measured for both PfDd2 WT (n = 43) and PfDd2 ΔMSP2 (n = 35) parasites that had successfully invaded a RBC using live cell imaging of merozoite invasion.”

      We have also removed the more general description of ‘a minimum of 30 merozoites’ from the same Figure Legend.

      The number of schizont ruptures and subsequent merozoite invasions followed for each experiment is in line with previous studies that have investigated phenotypes with invasion inhibitors and gene knock-outs (e.g. Weiss et al. 2015, PLoS Pathogens). It is important to note that the data refers to merozoites that have completed invasion, and not just the number of merozoites that have been released from a schizont which is typically 2-4 times more than have invaded. This means we are comparing the kinetics of invasion across a relatively large sample size compared to other studies of inhibitory phenotypes. While it is possible that increasing the number of merozoites being filmed might lead to some statistical significance for some of the trends, we note that there is a limited growth phenotype overall in both short and long-term culture and this fits with the limited defect we are seeing. In order to better address this, as outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we now discuss the trends seen in the data in additional detail.

      (3) The comparative RNAseq data is interesting, but is not followed up to any significant degree. Multiple transcripts are up-regulated in the absence of PfMSP2, but they are largely dismissed because they are genes of unknown function, not previously linked to invasion, or lack an obvious membrane anchor. Having gone to the lengths of exploring potentially compensatory changes in gene expression, it is disappointing not to validate or explore the hits that result.

      While we understand the reviewers comment, as outlined in the text we did not identify any upregulated proteins that looked like strong candidates to compensate for loss of MSP2 to explore in this manuscript. Instead, we chose to further investigate any potential loss of MSP2 phenotype that yielded the observations around improved potency of antibodies targeting some merozoite antigens with loss of MSP2. This will be explored in future studies as we try and understand the role of MSP2 in more detail and the interactions between proteins and antibodies on the merozoite surface.

      (4) Given the abundance of PfMSP2 on the merozoite surface, it would have been interesting to see whether the knockout lines have any noticeable difference in surface composition, as viewed by electron microscopy, although, of course, this experiment relies on access to the appropriate facilities.

      We agree with the reviewer, but this lies outside the scope of this manuscript and optimisation of the imaging platform used to gain biologically useful insights would take a considerable amount of work based on feedback from people working with these techniques.

      (5) One of the key findings is that deletion of PfMSP2 increases inhibition by some antibodies/nanobodies (some anti-CSS2, some anti-AMA1) but not others (anti-EBA/RH, anti-EBA175, anti-Rh5, anti-TRAMP, some anti-CSS2, some anti-AMA1). The data supporting these changes in inhibition are solid, but the selectivity of the effect (only a few antibodies, and generally those targeting later stages in invasion) is not really discussed in any detail. Do the authors have a hypothesis for this selectivity? The authors make attempts to explore the mechanisms for this antibody-masking (Figure 7), but the data is less solid. Surface Plasmon Resonance was non-conclusive, while an ELISA approach co-incubating MSP2 and anti-AMA1 antibodies to wells coated with AMA1 lacks appropriate controls (eg, including other merozoite proteins in similar experiments).

      As outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we have repeated the ELISA based assessment of recombinant MSP2s impact on anti-AMA1 antibody binding. In addition, we have included two comparator control proteins, the intrinsically disordered MSP4 of P. falciparum and the globular domain of the neural cell adhesion molecule (NCAM, CD56, 16 kDa), and found these proteins did not impact binding of anti-AMA1 antibodies. This strengthens the data that links the presence of MSP2 to reduced activity of anti-AMA1 antibodies.

      As covered in our response to point (7) for Reviewer 2 (Recommendations for the authors) we provide additional discussion of this phenotype. We note that the list of inhibitory antibodies tested is not exhaustive, and additional antibodies may be identified where loss of MSP2 could improve potency. So although we see a consistent effect with a relatively small number of antibody targets, this does not rule out additional examples that may act earlier in invasion (for example, we noticed a small, but not statistically significant, trend for mildly inhibitory antibodies targeting MSP1-19 as well) and this makes speculating on why these two initial antibody targets at this time problematic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) If feasible, perform ex vivo assays to demonstrate that the masking effect operates with physiologically relevant antibodies.

      For this manuscript, we focussed on characterising the MSP2 knock-out parasites using the best reagents available. We remain interested in understanding whether these lines can be used to investigate the activity of functional antibodies from malaria exposed human serum and this will be the subject of future studies.

      Reviewer #2 (Recommendations for the authors):

      (1) As noted in the Public Review, the section describing MSP2 orthologues in other Laverania and avian Plasmodium species is overly long and not the most novel section of the manuscript. It could be really radically trimmed back.

      We have taken this suggestion for the reviewer on board and have significantly cut back on our descriptions of the basic similarity properties of the conserved N and C-terminal regions as well as the description of the central variable region. Effectively, we have cut back the number of words through this section from 864 across 3 paragraphs to 478 across 2 paragraphs. While we have chosen to greatly economise our description of the N and C-terminal conserved regions, we have maintained much of the description of the similarities and differences in the central variable region as we believe the observation that this variant region still maintaining repeats, though they differ in size, number and amino acid composition, across such evolutionary distances is of interest.

      Taking the reviewers comment on board, we have also removed Table 1 from the manuscript (shows amino acid sequence properties of these regions) and instead have inserted the tables relevant for each alignment in Supplementary Figures 2, 3 and 4 as appropriate. This will streamline the main manuscript and better align amino acid property and alignment data in the one Figure. We thank the reviewer for this feedback and believe that this has helped focus the text on the most important observations.

      (2) Figure 2C - As MSP2 has stage-specific expression, it could be informative to incorporate an antibody targeting another gene with a similar stage-specific expression pattern, such as AMA,1 into the blot. This would confirm that both protein samples were collected at a similar point during blood stage development.

      We have modified Figure 2C to include both the original comparison using PfAldolase as the loading control and also the merozoite expressed PfGAP45 as a loading/stage specific control as per the Figure.

      (3) Figure 2D - Magenta and red are hard to distinguish in the merge channel. Is it possible to pseudocolour one of these channels a different colour? Also, it would be simpler to keep PfMSP2 a consistent colour in both rows.

      Thank you for this suggestion and we agree that the comparison could be made clearer. For this figure, we have coloured DAPI to label the nuclei (Cyan), and antibodies targeting PfMSP2 (Magenta), PfAMA1 and PfMSP1-19 (Yellow). This is also reflected in the merged image. The Figure legend now reads:

      “(D) Distribution of key merozoite surface proteins in the presence or absence of PfMSP2 was visualised by immunofluorescence. PfMSP2 (magenta), the nucleus stained by DAPI (cyan) and PfAMA1 (yellow, top two rows) or PfMSP1-19 (yellow, bottom two rows), and the coloured merge of the preceding panels. Scale bar = 0.7 µm. Representative images shown from a minimum of 10 schizonts imaged per condition.”

      (4) Figure 2F - Static growth relative to shaking growth is plotted in this panel; perhaps this could be more clearly described in the legend or mentioned in the text that there was not a significant alteration in growth in static or shaking conditions.

      As suggested, we have clarified the result in the Figure legend text as follows:

      “(E-F) Growth of Pf3D7 WT compared to Pf3D7 ΔMSP2 P. falciparum parasites, measured as fold increase in parasitaemia, over one (48 hrs) or two (96 hrs) cycles in either standard (still- (E)) or shaking (F) conditions, with no measurable difference between parasite growth rates seen between standard or shaking conditions.”

      Please also describe the shaking conditions used (i.e., speed, culture size, and vessel) in the methods.

      We have updated the methods to provide information on the growth conditions used in the standard versus shaking growth assays:

      “The initial parasitemia of cultures was determined by flow cytometry and then measured again after the 50 mL cultures in 96 well plates were maintained under standard (still) or shaking (50 rpm) conditions for 48 hrs or 96 hrs of growth.”

      (5) Figure 3G - Annotate legend for strength of deformation to describe what 1,2, or 3 refers to.

      We have added the following to the Figure legend of Figure 3G:

      “Deformation scores are as defined by Weiss et al (Weiss et al., 2015), with 1 = weak deformation of the RBC membrane at the point of contact, 2 = strong deformation leading to the RBC membrane extending up the sides of the merozoite and changes in RBC membrane curvature beyond the point of contact and 3 = extreme deformation indicated by the merozoite being deeply embedded in the RBC membrane and strong deformation of the RBC well beyond the point of contact.”

      There is a small visible shift in the deformation event scores. Is this also not significant? Even if deformation is not significantly longer, could this small effect alter the exposure of epitopes on other proteins for antibody targeting?

      We did test the deformation event scores and the differences were non-significant. We have considered this possibility raised by the reviewer, but we are cautious in over interpreting the possibility that these trends might contribute to the increased potency of certain antibodies in the absence of additional data. We note that, although deformation may happen over a slightly longer timescale and show more aggressive deformations with PfMSP2 knock-out, this also seems to translate into a weak trend for faster overall entry for those merozoites that go on to invade. Therefore, although deformation may be longer and stronger, antibodies may have less time to block invasion overall. We are not confident that we can interpret around what might be happening at the molecular scale here based on this data and have chosen not to discuss this possibility in the manuscript. However, we have added the following to the results to better explain the phenotype the phenotype we observed.

      “This analysis showed that, although there was a trend for PfDd2 ΔMSP2 knock-out parasites to have a higher mean time to attach to the RBC, as well as for the length and strength of RBC deformation, these trends did not reach significance. For those merozoites that did invade the RBC, on average it took less time for PfDd2 ΔMSP2 knock-out parasites to invade then PfDd2 WT, but this again did not reach significance (Figure 3 E-H). Together these data show PfMSP2 is not essential for blood-stage replication in vitro in two P. falciparum laboratory isolates from different geographical regions and knock-out of PfMSP2 does not seem to significantly impact parasite growth or merozoite invasion in vitro.”

      (6) Figure 4C - Legend refers to black lines, but on the figure, they are red? Is the horizontal red line in the correct place, or should some of the dots below it be black rather than blue if they fall outside the adjusted p-value significance cut-off? Were 4 schizont harvests performed in total, or 4 for each cell line?

      We thank the reviewer for pointing this out and we have now changed the text to say red lines. We have also provided more information in the Figure legend to more clearly define what data is represented. In short, 4 harvests were performed for each cell line (8 in total across the 2 cell lines) and the data represents the distribution from one of these harvests. The blue shaded genes are those that, on average, across the 4 Pf3D7 WT and Pf3D7 ΔMSP2 paired harvests show up or down-regulated expression. This is why some of the blue shaded genes lie near or below the cut-off values represented by the red line. The Figure legend text has now been modified as follows.

      “(C) Log2(fold change) for differentially expressed genes, including multigene families, between the transcriptome of Pf3D7 WT and Pf3D7 ΔMSP2 schizonts. Plot represents the results for one of four independent schizont RNA harvests for Pf3D7 WT and Pf3D7 ΔMSP2 parasites and red lines differentiate genes with a log2 (fold change) > 0.5 and < -0.5 with adjusted p-value < 0.01. Genes shaded blue represent those genes that were found to have an average log2 (fold change) > 0.5 (dark blue) or < -0.5 (light blue) across the four replicate samples compared. Significance determined as below p< 0.05 after correction for multiple testing.”

      (7) Figure 7D - ELISA results don't show a convincing concentration-dependent inhibition, and repeating with another recombinant protein is essential before inferring that the effect is specific to PfMSP2

      We have repeated the ELISA experiment using recombinant PfMSP2 to reduce variability across the assay and again found a dose dependent reduction of anti-PfAMA1 binding with increasing concentrations of recombinant PfMSP2. It should be noted that this is a completely new set of experiments that recapitulate the original findings. See updated Figure 7D.

      We agree with the reviewer that the experiment and interpretation of the data would be strengthened by comparing any potential inhibitory impact on anti-PfAMA1 binding to a different recombinant protein. Therefore, we have completed identical experiments using the similarly intrinsically disordered PfMSP4 recombinant protein (40 kDa) and the highly structured 16 kDa immunoglobulin domain of human neural cell adhesion molecule (NCAM). We find that there is no dose dependent loss of anti-PfMAMA1 binding to recombinant PfAMA1 with addition of PfMSP4 or NCAM immunoglobulin domain recombinant protein. These controls are contained in Supplementary Figure 6, the relevant text is provided below.

      ‘In contrast, increasing concentrations of the intrinsically disordered MSP4 from P. falciparum 3D7 (40 kDa) and the highly structured immunoglobulin domain of neural cell adhesion molecule (NCAM, CD56, 16 kDa) recombinant proteins did not impact on binding of anti-PfAMA1 antibodies to recombinant AMA1 (Supplementary Figure 6).’

      (8) Again, as noted in the public review, the target-specificity of the inhibition-masking effect is perhaps the most surprising aspect of the data - this could do with much more thorough discussion. Why only these proteins, both of which function late in invasion?

      Overall, we tested several growth inhibitory and non-inhibitory antibodies shown to bind specifically to individual or some combination of nine P. falciparum merozoite surface and secreted proteins. However, we do not consider this to be an exhaustive list of potentially invasion inhibitory antibodies by any means. We mostly did not observe any non-inhibitory antibodies becoming significantly more growth inhibitory to PfMSP2 KO lines, indicating that these antibodies were not impacted by loss of PfMSP2 or had no functional inhibitory effect in these assays.

      What we do demonstrate here is that we see a consistent impact with different rabbit, mouse monoclonal and i-body growth inhibitory antibodies targeting PfAMA1, indicating that it is not a spurious result from a single antibody or antibody type. We also find a second example, with nanobodies targeting the PfPCRCR complex protein PfCSS potentiated with loss of PfMSP2. This opens up the possibility that other growth inhibitory antibodies to the antigens tested here, or growth inhibitory antibodies targeting other antigens involved in merozoite invasion, may also become more potent with MSP2KO. Although both PfAMA1 and PfCSS function late in invasion, it is too early to say whether this is a functional trend or an observation that is related to the panel of antibodies tested. Therefore, further testing using lines developed in this study could yield additional examples of antibodies that become more inhibitory with MSP2 KO and provide additional information on the potential impact that MSP2 may have on their vaccine potential. In order to address this, we have added the following text to the discussion:

      “Here we show consistent potency improvement with PfMSP2 knock-out for growth inhibitory rabbit, mouse monoclonal and i-body antibodies targeting PfAMA1, as well as demonstrate improved activity for and Fc-tagged nanobody targeting PfCSS, indicating that these are not outlier results from a single antibody or antibody type. However, increased antibody potency was not shared across all antibodies tested, possibly because the specific function or localisation of a target protein, the region that an antibody binds to or the functional activity (or lack thereof) of an antibody may all play a role in determining whether loss of PfMSP2 can potentiate growth inhibitory activity. Further investigation using the parasite lines developed in this study and a wider panel of antibodies that target different stages of the merozoite invasion process could shed more light on this potentially novel mechanism of vaccine derived antibody efficacy.”

      (9) Typos/minor editorial points:

      L111 – conserved

      This text has been modified.

      L235-237 - check the wording in this sentence for clarity

      This text has been modified.

      Figure 3E - 'attachment' on axis

      This Figure has been modified.

      L350 - mentions eight 'proteins' having expression increase, instead 'transcripts' should be referred to when describing RNAseq data, as transcript levels may not correspond directly with protein levels. Also, be careful when referring to transcript or protein throughout this paragraph.

      This text has been modified.

      Figure 4A - instead of 'transcription during schizonts', better to say 'schizont transcript abundance'

      This text has been modified.

      L514 - 'detectable binding to PfAMA1'

      This text has been modified.

      L589 - Is it a mouse Fc region or a human Fc region that is added? The human Fc region is mentioned in the results.

      In the growth inhibition assays anti-AMA1 WD34 i-body with a human FC region was used and in the ELISA assays anti-AMA1 WD34 i-body with a mouse FC region (to enable detection of AMA1 binding use the same secondary anti-body for both the WD34 i-body and the 4G2 mouse monoclonal antibody) was used. The text has been been checked and modified accordingly to clearly say this.

      Supplementary figure 3 - 'repeats'

      This text has been modified.

    1. eLife Assessment

      This manuscript describing the phenotypes associated with loss and gain of RVCL-S documents important findings that have practical implications. Although the data and methods are solid and support many claims, there remain some concerns about mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe the generation of a Drosophila model of RVCL-S by disrupting the fly TREX1 ortholog cg3165 and by expressing human TREX1 transgenes (WT and the RVCL-S-associated V235Gfs variant). They evaluate organismal phenotypes using OCT-based cardiac imaging, climbing assays, and lifespan analysis. The authors show that loss of cg3165 compromises heart performance and locomotion, and that expression of human TREX1 partially rescues these phenotypes. They further report modest differences between WT and mutant hTREX1 under overexpression conditions. The study aims to establish Drosophila as an in vivo model for RVCL-S biology and future therapeutic testing.

      Strengths:

      (1) The manuscript addresses an understudied monogenic vascular disease where animal models are scarce.

      (2) The use of OCT imaging to quantify fly cardiac performance is technically strong and may be useful for broader applications.

      (3) The authors generated both cg3165 null mutants and humanized transgenes at a defined genomic landing site.

      (4) The study provided initial in vivo evidence that human TREX1 truncation variants can induce functional impairments in flies.

      Weaknesses:

      (1) Limited mechanistic insight.

      RVCL-S pathogenesis is strongly linked to mislocalization of truncated TREX1, DNA damage accumulation, and endothelial/podocyte cellular senescence. The current manuscript does not examine any cellular, molecular, or mechanistic readouts - e.g. DNA damage markers, TREX1 subcellular localization in fly tissues, oxidative stress, apoptosis, or senescence-related pathways. As a result, the model remains largely phenotypic and descriptive.

      To strengthen the impact, the authors should provide at least one mechanistic assay demonstrating that the humanized TREX1 variants induce expected molecular consequences in vivo.

      (2) The distinction between WT and RVCL-S TREX1 variants is modest.

      In the cg3165 rescue experiments, the authors do not observe differences between hTREX1 and the V235Gfs variant (e.g., Figure 3A-B). Phenotypic differences only emerge under ubiquitous overexpression, raising two issues:

      (i) It is unclear whether these differences reflect disease-relevant biology or artifacts of strong Act5C-driven expression.

      (ii) The authors conclude that the model captures RVCL-S pathogenicity, yet the data do not robustly separate WT from mutant TREX1 under physiological expression levels.

      The authors should clarify these limitations and consider additional data or explanations to support the claim that the model distinguishes WT vs RVCL-S variants.

      (3) Heart phenotypes are presented as vascular defects without sufficient justification.

      RVCL-S is a small-vessel vasculopathy, but the Drosophila heart is a contractile tube without an endothelial lining. The authors refer to "vascular integrity restoration," but the Drosophila heart lacks vasculature.

      The manuscript would benefit from careful wording and from a discussion of how the fly heart phenotypes relate to RVCL-S microvascular pathology.

      (4) General absence of tissue-level or cellular imaging.

      No images of fly hearts, brains, eyes, or other tissues are shown. TREX1 nuclear mislocalization is a hallmark of RVCL-S, yet no localization studies are included in this manuscript.

      Adding one or two imaging experiments demonstrating TREX1 localization or tissue pathology would greatly enhance confidence in the model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used the Drosophila heart tube to model Retinal vasculopathy with the goal of building a model that could be used to identify druggable targets and for testing chemical compounds that might target the disease. They generated flies expressing human TREX1 as well as a line expressing the V235G mutation that causes a C-terminal truncation that has been linked to the disease. In humans, this mutation is dominant. Heart tube function was monitored using OCM; the most robust change upon overexpression of wild-type or mutant TREX1was heart tube restriction, and this effect was similar for both forms of TREX1. Lifespan and climbing assays did show differential effects between wt and mutant forms when they were strongly and ubiquitously expressed by an actin-Gal4 driver. Unfortunately, these types of assays are less useful as drug screening tools. Their conclusion that the primary effect of TREX is on neuronal function is inferential and not directly supported by the data.

      Strengths:

      The authors do not show that CG3165 is normally expressed in the heart. Further fly heart tube function was similarly restricted in response to expression of either wild-type or mutant TREX1. The fact that expression of any form of human TREX1 had deleterious effects on heart function suggests that TREX1 serves different roles in flies compared to humans. Thus, in the case of this gene, it may not be a useful model to use to identify targets or use it as a drug screening tool.

      The significant effects on lifespan and climbing that did show differential effects required ubiquitous overexpression using an actin-gal4 driver that does not allow the identification of tissue-specific effects. Thus, their assertion that the results suggested a strong positive correlation between Drosophila neuromotor regulation and transgenic hTREX1 presence and a negative impact from hTREX1 V235G" is not supported by these data. Also worrisome was the inability to identify the mutant TREX1 protein by Western blot despite the enhanced expression levels suggested by qPCR analysis. Mutant TREX1 cannot exert a dominant effect on cell function if it isn't present.

      There are also some technical problems. The lifespan assays lack important controls, and the climbing assays do not appear to have been performed correctly. It is unclear what the WT genetic background is in Figure 1-3, so it is unclear if the appropriate controls have been used. Finally, the lack of information on the specific statistical analyses used for each graph makes it difficult to judge the significance of the data. Overall, the current findings establish the Retinal vasculopathy disease model platform, but with only incremental new data and without any mechanistic insights.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe the generation of a Drosophila model of RVCL-S by disrupting the fly TREX1 ortholog cg3165 and by expressing human TREX1 transgenes (WT and the RVCL-S-associated V235Gfs variant). They evaluate organismal phenotypes using OCT-based cardiac imaging, climbing assays, and lifespan analysis. The authors show that loss of cg3165 compromises heart performance and locomotion, and that expression of human TREX1 partially rescues these phenotypes. They further report modest differences between WT and mutant hTREX1 under overexpression conditions. The study aims to establish Drosophila as an in vivo model for RVCL-S biology and future therapeutic testing.

      Strengths:

      (1) The manuscript addresses an understudied monogenic vascular disease where animal models are scarce.

      (2) The use of OCT imaging to quantify fly cardiac performance is technically strong and may be useful for broader applications.

      (3) The authors generated both cg3165 null mutants and humanized transgenes at a defined genomic landing site.

      (4) The study provided initial in vivo evidence that human TREX1 truncation variants can induce functional impairments in flies.

      Weaknesses:

      (1) Limited mechanistic insight.

      RVCL-S pathogenesis is strongly linked to mislocalization of truncated TREX1, DNA damage accumulation, and endothelial/podocyte cellular senescence. The current manuscript does not examine any cellular, molecular, or mechanistic readouts - e.g. DNA damage markers, TREX1 subcellular localization in fly tissues, oxidative stress, apoptosis, or senescence-related pathways. As a result, the model remains largely phenotypic and descriptive.

      We thank the reviewers for these suggestions. We are planning to perform experiments addressing the RVCL-S linked cellular deviations. We will examine DNA damage markers on cellular level and perform TUNEL tissue staining to visualize apoptosis, etc.

      To strengthen the impact, the authors should provide at least one mechanistic assay demonstrating that the humanized TREX1 variants induce expected molecular consequences in vivo.

      Yes, we are planning to demonstrate the distinct effects from TREX1 and TREX1 V235G expression on molecular level.

      (2) The distinction between WT and RVCL-S TREX1 variants is modest.

      In the cg3165 rescue experiments, the authors do not observe differences between hTREX1 and the V235Gfs variant (e.g., Figure 3A-B). Phenotypic differences only emerge under ubiquitous overexpression, raising two issues:

      i) It is unclear whether these differences reflect disease-relevant biology or artifacts of strong Act5C-driven expression.

      Thanks for pointing out this issue. We will discuss the differences between two expression models in the revised manuscript.

      ii) The authors conclude that the model captures RVCL-S pathogenicity, yet the data do not robustly separate WT from mutant TREX1 under physiological expression levels.

      We will provide more details related to the RVCL-S disease development and agerelated manifestations.

      The authors should clarify these limitations and consider additional data or explanations to support the claim that the model distinguishes WT vs RVCL-S variants.

      We will address the reviewer concerns and re-write the related manuscript sections to provide more clarity.

      (3) Heart phenotypes are presented as vascular defects without sufficient justification.

      RVCL-S is a small-vessel vasculopathy, but the Drosophila heart is a contractile tube without an endothelial lining. The authors refer to "vascular integrity restoration," but the Drosophila heart lacks vasculature.

      We will expand the model justification section and will be more careful with our statements to avoid misunderstanding of the experimental conclusions.

      The manuscript would benefit from careful wording and from a discussion of how the fly heart phenotypes relate to RVCL-S microvascular pathology.

      We thank the reviewer for pointing to this issue. Justifying Drosophila usage for human disease modelling is always challenging. We will re-write the corresponding parts of the manuscript.

      (4) General absence of tissue-level or cellular imaging.

      No images of fly hearts, brains, eyes, or other tissues are shown. TREX1 nuclear mislocalization is a hallmark of RVCL-S, yet no localization studies are included in this manuscript. Adding one or two imaging experiments demonstrating TREX1 localization or tissue pathology would greatly enhance confidence in the model.

      As suggested by the reviewers,we will add tissue imaging experiments to illustrate the pathological effects of RVCL linked TREX1 expression. We are also planning to utilize CRIMIC line CR70804 to visualize fly TREX1 tissue distribution.

      Reviewer #2 (Public review):

      Summary:

      The authors used the Drosophila heart tube to model Retinal vasculopathy with the goal of building a model that could be used to identify druggable targets and for testing chemical compounds that might target the disease. They generated flies expressing human TREX1 as well as a line expressing the V235G mutation that causes a C-terminal truncation that has been linked to the disease. In humans, this mutation is dominant. Heart tube function was monitored using OCM; the most robust change upon overexpression of wild-type or mutant TREX1was heart tube restriction, and this effect was similar for both forms of TREX1.

      Our results are consistent with the human disease nature, RVCL-S carriers and non-carriers are both healthy and asymptomatic at young age; however, the accumulation of physiological stress becomes obvious in midlife, leading to premature death in 40s and 50s. We will expand the discussion section focusing on RVCL-S manifestations in aged animals.

      Lifespan and climbing assays did show differential effects between wt and mutant forms when they were strongly and ubiquitously expressed by an actin-Gal4 driver. Unfortunately, these types of assays are less useful as drug screening tools. Their conclusion that the primary effect of TREX is on neuronal function is inferential and not directly supported by the data.

      We will revise this experiment discussion and plan to include additional experiments to strengthen the conclusions.

      The authors do not show that CG3165 is normally expressed in the heart. Further fly heart tube function was similarly restricted in response to expression of either wild-type or mutant TREX1. The fact that expression of any form of human TREX1 had deleterious effects on heart function suggests that TREX1 serves different roles in flies compared to humans. Thus, in the case of this gene, it may not be a useful model to use to identify targets or use it as a drug screening tool.

      We will examine the expression of cg3165, human TREX1 transgenes in whole organism to demonstrate tissue expression profiles, as noted above. We will also expand the relevant manuscript sections to address the systemic manifestations of RVCL.

      The significant effects on lifespan and climbing that did show differential effects required ubiquitous overexpression using an actin-gal4 driver that does not allow the identification of tissue-specific effects.

      We plan to carry out additional experiments to determine cg3165, and human TREX1 tissue expression profile.

      Thus, their assertion that the results suggested a strong positive correlation between Drosophila neuromotor regulation and transgenic hTREX1 presence and a negative impact from hTREX1 V235G" is not supported by these data.

      Thanks for pointing this out. We will revise our conclusions appropriately after we include the results from additional new experiments.

      Also worrisome was the inability to identify the mutant TREX1 protein by Western blot despite the enhanced expression levels suggested by qPCR analysis. Mutant TREX1 cannot exert a dominant effect on cell function if it isn't present.

      We will try to resolve this issue by technical means.

      There are also some technical problems. The lifespan assays lack important controls, and the climbing assays do not appear to have been performed correctly.

      We would disagree with this statement. We will re-write the method description for better clarity.

      It is unclear what the WT genetic background is in Figure 1-3, so it is unclear if the appropriate controls have been used. Finally, the lack of information on the specific statistical analyses used for each graph makes it difficult to judge the significance of the data.

      We will provide clearer descriptions of our controls and procedures.

      Overall, the current findings establish the Retinal vasculopathy disease model platform, but with only incremental new data and without any mechanistic insights.

      We will include additional experiments addressing the mechanism (see previous responses above).

      Reviewing Editor Comments:

      I (Hugo Bellen) also read your paper and noted that you do not document the expression pattern in the nervous system and other tissues, such as the heart. The stock https://flypush.research.bcm.edu/pscreen/crimic/info.php?CRname=CR70804 may help you do this and should allow you to compare the GAL4 induced expression of the stock you created and this stock. If compatible, you should consider reporting expression patterns.

      Thank you for the suggestion. We will obtain the line and will use it for expression visualization.

    1. eLife Assessment

      This study reports important findings regarding the role of the NF-kB signaling pathway in the development and long-term survival of gamma delta T cells. The authors report disparate roles of IKK-dependent NF-kB activation in the development and long-term survival of gamma delta T cell subsets. The approach and methodology employed is convincing. This work will be of great interest to immunologists interested in innate-like T cell biology and in T cell development.

    2. Reviewer #1 (Public review):

      Summary:

      The NF-kB signaling pathway plays a critical role in the development and survival of conventional alpha beta T cells. Gamma delta T cells are evolutionarily conserved T cells that occupy a unique niche in the host immune system and that develop and function in a manner distinct from conventional alpha beta T cells. Specifically, unlike the case for conventional alpha beta T cells, a large portion of gamma delta T cells acquire functionality during thymic development, after which they emigrate from the thymus and populate a variety of mucosal tissues. Exactly how gamma delta T cells are functionally programmed remains unclear. In this manuscript, Islam et al., use a wide variety of mouse genetic models to examine the influence of the NF-kB signaling pathway on gamma delta T cell development and survival. They find that the inhibitor of kappa B kinase complex (IKK) is critical to the development of gamma delta T1 subsets, but not adaptive/naïve gamma delta T cells. In contrast, IKK-dependent NF-kB activation is required for their long-term survival. They find that caspase 8-deficiency renders gamma delta T cells sensitive to RIPK1-mediated necroptosis and they conclude that IKK repression of RIPK1 is required for the long-term survival of gamma delta T1 and adaptive/naïve gamma delta T cells subsets. These data will be invaluable in comparing and contrasting the signaling pathways critical for the development/survival of both alpha beta and gamma delta T cells.

      Comments on revisions:

      The word adaptive is misspelt throughout most figures.

    3. Reviewer #2 (Public review):

      This study presents a comprehensive genetic dissection of the role of IKK signaling in the development and maintenance of lymphoid gd T cells. By employing a variety of conditional and mutant mouse models, the authors demonstrate that IKK-dependent NF-κB activation is essential for the generation of type 1 gd T cells, while adaptive gd T cells require this pathway primarily for long-term survival. The use of multiple complementary genetic strategies, including IKK deletion and modulation of RIPK1 and CASPASE8 activity, provides robust mechanistic insight into subset-specific regulation of gd T cell homeostasis. Overall, the study provides mechanistic insight for IKK-dependent regulation of gd T cell development and peripheral maintenance.

      Comments on revisions:

      Thank you for your comments and clarifications.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      We were cautious in our gating strategy since the TCR𝛿+ CD3e+ subset is rather small and so low signal/background noise ratio can be an issue if the gates used are too broad/generous. There is some inevitable low level background staining with the TCR𝛿 that sits just above the bulk of the negative population and is CD3ε -ve. Although this background represents a tiny fraction of total cells, we were wary of gate contamination into our TCR𝛿+ CD3e<sup>+</sup> subset and we wanted a gating strategy that could be applied across other organs too. We do not, however, believe this conservative strategy is impacting on measurements progenitor numbers across strains or our conclusions, since the size of this progenitor population in the various IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains was never impacted by the mutations. But to reassure the reviewer, we show our conservative gate as compared with a very broad TCR𝛿 gate and see we are not missing a substantial population of CD25+ cells just below our gate. This also helps illustrate how close the background from the CD27<sup>int</sup> expressing αβ thymocytes (right column) comes to the TCR𝛿+ CD3+ gate and the importance of tight lineage gating.

      Author response image 1.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells.

      Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data.

      Apologies for not being clearer on this point. We have studied conventional αβ T cell development in these strains in considerable detail, and these studies are published and discussed in some detail in the introduction in paragraph 3 on page 3-4 and in cited references Schmidt-Supprian et al 2004, SIlva et al 2014, Xing et al 2016, Webb et al 2019, Carty et al 2023. These detail how IKK expression is critical for thymic development of αβ T cells and their peripheral survival, and dissects the role of NF-κB activation and cell death regulation by IKK. However, we now add new discussion (page 11-12) that considers the potential impact of altered αβ T cell development in the strains used for this study.

      We agree that trans-conditioning is also an important consideration, since CD4 TH17 T cells can enhance type 17 𝛾𝛿 T cell development (10.1038/icb.2011.50). This is of relevance to the limited conclusions we draw concerning type 17 𝛾𝛿 T cells. The REL and IKK deficient strains do lack effector populations, including type 17 αβ T cells, so it is possible that the absence of type 17 αβ T cells in these strains does contribute to the modest impact of IKK deletion in the type 17 𝛾𝛿 subset. We now highlight this information and discuss in the manuscript (page 11-12).

      Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      We now provide both the absolute frequencies of different 𝛾𝛿 subsets and their relative frequencies to one another, as supplementary figure 2. We still believe assessing absolute numbers is the gold standard, since the differential impact of gene deletions on the αβ T cell compartments in different strains will effect whether or not αβ T cells are present, and therefore overall representation of 𝛾𝛿 T cells can vary considerably between strains. Hence, absolute numbers are more reliable measure of cell abundance.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      We did of course process and calculate numbers of different subsets in both lymph nodes and spleen. Where we saw loss of peripheral 𝛾𝛿 subsets, or rescue, this was reflected in seperate analysis of both organs and we did not see any organs specific effects in the mouse strains analysed. We therefore took the initial view that presenting aggregate data was most efficient and least repetitive representation of data. However, we very much recognise the reviewers concern, and interest to see these data, so have now included representative plots across both organs for figure 1D, and show cell numbers of lymph nodes and spleen separately, as well as together, for figures 1, 2, 4 and 7, and these plots reflect the differences observed when we combined data. We did not break down the data for all figures (e.g. figures 3 and 5) as it was more cumbersome for more complex multi-strain comparisons and so attempt to balance clarity and transparency against unnecessary repetitive data presentation.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      We did share a similar concern, but think this is not an issue where subsets disappear and are almost completely absent, such as in IKK1/2 KO and Casp8 KO settings. Where we saw rescue with RIPK1<sup>D138N</sup> in Casp8ΔT<sup>CD2</sup> strains, we were keen to demonstrate that the populations we saw restored did exhibit their expected function, and so confirmed this in figure 5C by intracellular cytokine staining after a short 4h restimulation in vitro. This also served to validate our gating strategy, since what we designated as Type 1 cells - CD27+CD122+CD44<sup>int</sup> cells were the only source of IFN-gamma, while CD27–CD44<sup>hi</sup> CD122<sup>lo</sup> cells were the only source of IL-17. Adaptive/ naive cells made neither cytokine. So while we did not include nuclear stains, we were satisfied that the cytokine assays validated the gating strategy.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      Yes, we completely agree with this assessment and conclusion. While kinase dead RIPK1 does provide some rescue, this appears relatively modest, and instead supports the view, validated in figure 7, that maybe the dominant function of IKK in 𝛾𝛿 T cells is to activate NF-κB dependent survival signals. Nevertheless, RIPK1<sup>D138N</sup> does provide some significant rescue, which allows some peripheral cells to repopulate and demonstrates that IKK is repressing RIPK1 mediated cell death. It is actually not trivial to assess the relative importance of IKK-RIPK1 and IKK-NF-κB functions. In the IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice, we prevent RIPK1 induced death, but still lack the NF-κB-dependent survival signal. Consistent with this, the ~1log reduction in 𝛾𝛿 numbers between WT and IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice is actually similar to what we observe in the absence of REL subunits (Fig. 7) which is a smaller reduction than we observe in IKKΔT<sup>CD2</sup> mice. What would have been ideal is to have a scenario where IKK regulation of RIPK1 was defective but NF-κB survival signalling was intact. This would reveal the full impact of loosing IKK dependent regulation of RIPK1 alone, which we suspect would result in substantial cell death that could not be blocked by NF-κB. Unfortunately, we not have or know of suitable mouse mutants to test this. This is quite a nuanced discussion and we now clarify the scope and extent of conclusions we can draw (p. 7, 11).

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      Thank you - we include a discussion of these papers now (p12).

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

      Yes, we agree these do like quite different and could be confusing. The lymph nodes from IKKΔT<sup>CD2</sup> lack αβ T cells and B cells, and so the cellularity is much lower than normal. Consequently, the percentage representation of remaining cells can be more noisy, while total cellularity calculations are more consistent. This is not an issue in the other strains that all have more cells in lymph nodes. We now show plots from spleen of the same mice which appear better aligned with additional splenic data shown in Figure 1.

      Reviewer #2 (Public review):

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      We address this important point in the discussion (p11-12). The impacts of gene deletions upon αβ and 𝛾𝛿 T cells operate independently of one another (as also discussed in response to reviewer 1). For instance, the phenotype of αβ T cells is identical in IKKΔT<sup>CD2</sup> and IKKΔT<sup>CD4</sup> mice - 𝛾𝛿 T cells are only targeted in IKKΔT<sup>CD2</sup> mice. Similarly, the phenotype of 𝛾𝛿 T cells is similar in IKKΔT<sup>CD2</sup> vs Casp8.IKKΔT<sup>CD2</sup> strains. αβ T cells are absent from IKKΔT<sup>CD2</sup> but present in near normal numbers in Casp8.IKKΔT<sup>CD2</sup> mice. Others have also noted that 𝛾𝛿 T cell development is normal in Rag deficient mice (10.1126/science.1604321). In any case, an absence of αβ T cells is expected to promote 𝛾𝛿 T cell survival in the absence of competition for common utilised cytokines such as IL-7 and IL-15, though we do not see much evidence for this in mice with and without αβ T cells such as IKKΔT<sup>CD2</sup> vs Casp8. IKKΔT<sup>CD2</sup> strains. We do now discuss the potential contribution of trans-conditioning for type 17 𝛾𝛿 T cell development (p12).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      In IKKΔT<sup>CD2</sup> and Casp8. IKKΔT<sup>CD2</sup> deficient strains, we infer a defect in survival, since they lack peripheral 𝛾𝛿 T cells, despite normal thymic development. Their absence made it hard to assess proliferation and migration, though 𝛾𝛿 T cells were absent from all lymphoid organs. The conclusions that defective survival is responsible for the absence of 𝛾𝛿 T cells in the different strains is also supported by the rescue of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains by kinase dead RIPK1D138N. Furthermore, the presence of small numbers of residual populations in lymph nodes and spleen of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains demonstrates that migration patterns were normal. Were cells unable to recirculate, they might be expected to fail to leave the thymus, or to accumulate in the spleen. We so no evidence of either of these scenarios.

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      We did not unfortunately, assess chain usage, choosing rather to rely of phenotypic identity of specific subsets, which we show in figure 5C, was extremely robust. IL-17 was only secreted by CD27– CD44<sup>hi</sup> 𝛾𝛿 T cells, while IFN-gamma was only secreted by CD27+ CD44<sup>hi</sup> 𝛾𝛿 T cells. We argue that the production of these key effector cytokines is the most direct test of a subsets functional identity and the phenotypic designation is robust.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      Yes, we agree this remains an open question around the transcriptional mechanisms by which NFκB signalling promotes cell survival, and one best addressed in future studies. We did not perform cytokine staining more widely, because the cytokine assay relies on short term re-stimulation of T cells with PMA and ionomycin. PMA activates PKC which in turn activates NF-κB signalling to elicit the cytokine response measured in this assay. As such, the results of such assays would be hard to interpret. We agree it would be interesting to investigate the functional consequences of REL deficiency in future studies, although this may need a more nuanced setting where 𝛾𝛿 T cells are not lost as a result of their defective survival.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      We did attempt these experiments with collaborators by analysing type 17 𝛾𝛿 T cell development in fetal thymic organ culture (FTOC). However, the GM mice are not so easy to breed and generating the large numbers of embryos required to set up the FTOCs proved too challenging and we were unable to generate these data.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      Given time and resources, it would have been ideal to confirm necroptotic cell death by alternative knockouts, such as RIPK3 or MLKL. However, formation of the necrosome is dependent on kinase active RIPK1, since autophosphorylation of RIPK1 changes its conformation to allow recruitment of RIPK3 and MLKL and formation of the necrosome. Therefore, the rescue of CASPASE8 deficient T cells from cell death by kinase dead RIPK1 is very solid genetic evidence of necroptosis.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

      This was assessed in p105/RelA knockout strain, which only express cREL. What we lacked was an assessment of what RelA/p50 dimers can support in the absence of cREL. We do however, show the impact of RelA single deficiency, and RelA/p50 deficiency.

      In truth, we had many REL deficient strains and it was challenging to make all the combinations we wanted. However, we try to compensate for this by discussing what cREL:cREL dimers and cREL:P50 dimers are capable of doing by analysing 𝛾𝛿 T cell development in p105/RELA DKO and RELA KO mice - these do show that cREL:P50 can compensate in the absence of RELA, but cREL:cREL cannot.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Thank you - we have added such discussions. We are happy to add a graphical abstract if journal constraints permit this.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, “Adative").

      Thank you !

    1. eLife Assessment

      This study provides valuable insights into the protein composition of the C2a projection in mouse motile cilia, building upon prior work in Chlamydomonas. The evidence supporting the claims of the authors is solid. The work will be of interest to biologists and clinicians studying cilia and ciliopathies.

    2. Reviewer #1 (Public review):

      The central pair apparatus of motile cilia consists of two singlet microtubules, termed C1 and C2, each of which is associated with a set of projections, referred to as the C1 and C2 projections. Each projection comprises multiple distinct structural domains, designated a, b, c, and so on. Biochemical studies combined with genetic analyses in Chlamydomonas identified three proteins as the major components of the C2a projection, and subsequent cryo-EM studies confirmed these findings.

      In this paper, the authors aim to study the homologues of these three proteins-CCDC108/CFAP65, CFAP70, and MYCBPAP/CFAP147-using knockout mouse models. Biochemical and cell biological analyses demonstrate that, as in Chlamydomonas, these proteins are components of the C2 projection and form a complex that depends on the presence of each other. In addition, the authors use affinity purification to identify two previously uncharacterized proteins and show that they are central pair apparatus proteins that associate with the aforementioned complex. Knockout mice lacking any of the three core proteins exhibit phenotypes consistent with primary ciliary dyskinesia (PCD).

      Overall, the manuscript is clearly written, and the data are convincing and support the authors' conclusions. However, given the previous findings in Chlamydomonas, this work provides limited conceptual advances to the field. Nonetheless, it represents a useful and well-documented resource for understanding the conserved organization of the central pair apparatus in motile cilia. It will be of interest to cell and developmental biologists, biochemists, and clinicians studying and treating human ciliopathies.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the protein composition and functional role of the C2a projection of the central apparatus (CA) in vertebrate motile cilia. Using three knockout mouse models (Ccdc108, Mycbpap, and Cfap70), the authors demonstrate that these genes - homologs of Chlamydomonas FAP65, FAP147, and FAP70 - are required for normal motile cilia function in ependymal and tracheal multiciliated cells. Specifically, the authors show that:

      (1) Knockout mice for each gene exhibit primary ciliary dyskinesia phenotypes (hydrocephalus and sinusitis), accompanied by abnormal ciliary motion and reduced ciliary beat frequency.

      (2) CCDC108, MYCBPAP, and CFAP70 physically interact and localize to the axonemal central lumen, consistent with the C2a projection.

      (3) Loss of any one of these proteins destabilizes the others and disrupts CA integrity in a tissue-specific manner.

      (4) ARMC3 and MYCBP are C2a-associated proteins.

      Strengths:

      (1) Clarity: the results are presented in a coherent sequence that facilitates understanding of both the rationale and conclusions.

      (2) Genetic rigor: three independent knockout mouse lines that exhibit consistent motile cilia phenotypes provide in vivo support for the proposed role of these proteins.

      (3) Integration of structural and functional analyses: combination of ultrastructural (TEM) and immunofluorescence data with CBF measurements provides convincing correlation between structural defects and impaired ciliary function.

      (4) Mutual dependency model: reciprocal destabilization of CCDC108, MYCBPAP, and CFAP70 supports their interdependence in the C2a assembly.

      (5) Expansion of the vertebrate C2a proteome: the identification of ARMC3 and MYCBP as C2a-associated proteins provides a foundation for future mechanistic studies.

      Weaknesses:

      (1) Mechanistic depth: the data show a convincing correlation between C2a and ciliary function, but the cell type-specificity of CCDC108, MYCBPAP, and CFAP70 knockout effects is underdeveloped. This is an interesting observation that raises mechanistic/structural questions not addressed in the study, such as what is the role of C2a in CP nucleation, maintenance, or mechanical stabilization? Is C2a composition different in different cell types?

      (2) Cell model choice: co-immunoprecipitation was performed using mouse testis lysates. While this is a reasonable source of CA proteins from flagellated cells, the functional analyses in this study focus on ependymal and tracheal multiciliated cells. It would therefore be helpful for the authors to clarify the extent to which these interactions are expected to be conserved across ciliated cell types, and to discuss potential tissue-specific differences in CA assembly.

      (3) Statistical analysis: the manuscript states "Statistical significance was defined as P < 0.5", which is likely a typo, but should be P < 0.05. In general, the statistical methods require more clarification. In several figures (e.g., 2B, 2D, 5J, 5K), multiple knockout genotypes are compared with WT, yet unpaired t-tests are reported. When more than two groups are analyzed, multiple pairwise t-tests inflate Type I error unless appropriately corrected; a one-way ANOVA with post hoc comparisons (e.g., Dunnett's test for WT-referenced comparisons) would be more appropriate. Furthermore, the analysis of ciliary movement modes (Figure 2D) involves categorical data, for which a t-test is not statistically appropriate. These comparisons could instead be evaluated using chi-square or Fisher's exact tests. Addressing these issues is important to ensure accurate statistical inference.

      (4) Methods section: does not sufficiently describe how image-based quantifications were performed. For example, the criteria used to define cilia number, basal body number, and rotational beating are not specified, nor is how CBF measurements were analyzed. The authors should also provide details regarding analysis software and imaging parameters used (and whether they were kept constant across genotypes).

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The central pair apparatus of motile cilia consists of two singlet microtubules, termed C1 and C2, each of which is associated with a set of projections, referred to as the C1 and C2 projections. Each projection comprises multiple distinct structural domains, designated a, b, c, and so on. Biochemical studies combined with genetic analyses in Chlamydomonas identified three proteins as the major components of the C2a projection, and subsequent cryo-EM studies confirmed these findings.

      In this paper, the authors aim to study the homologues of these three proteins-CCDC108/CFAP65, CFAP70, and MYCBPAP/CFAP147-using knockout mouse models. Biochemical and cell biological analyses demonstrate that, as in Chlamydomonas, these proteins are components of the C2 projection and form a complex that depends on the presence of each other. In addition, the authors use affinity purification to identify two previously uncharacterized proteins and show that they are central pair apparatus proteins that associate with the aforementioned complex. Knockout mice lacking any of the three core proteins exhibit phenotypes consistent with primary ciliary dyskinesia (PCD).

      Overall, the manuscript is clearly written, and the data are convincing and support the authors' conclusions. However, given the previous findings in Chlamydomonas, this work provides limited conceptual advances to the field. Nonetheless, it represents a useful and well-documented resource for understanding the conserved organization of the central pair apparatus in motile cilia. It will be of interest to cell and developmental biologists, biochemists, and clinicians studying and treating human ciliopathies.

      We thank the reviewer for their positive comments on our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the protein composition and functional role of the C2a projection of the central apparatus (CA) in vertebrate motile cilia. Using three knockout mouse models (Ccdc108, Mycbpap, and Cfap70), the authors demonstrate that these genes - homologs of Chlamydomonas FAP65, FAP147, and FAP70 - are required for normal motile cilia function in ependymal and tracheal multiciliated cells. Specifically, the authors show that:

      (1) Knockout mice for each gene exhibit primary ciliary dyskinesia phenotypes (hydrocephalus and sinusitis), accompanied by abnormal ciliary motion and reduced ciliary beat frequency. 

      (2) CCDC108, MYCBPAP, and CFAP70 physically interact and localize to the axonemal central lumen, consistent with the C2a projection. 

      (3) Loss of any one of these proteins destabilizes the others and disrupts CA integrity in a tissue-specific manner. 

      (4) ARMC3 and MYCBP are C2a-associated proteins. 

      Strengths:

      (1) Clarity: the results are presented in a coherent sequence that facilitates understanding of both the rationale and conclusions. 

      (2) Genetic rigor: three independent knockout mouse lines that exhibit consistent motile cilia phenotypes provide in vivo support for the proposed role of these proteins. 

      (3) Integration of structural and functional analyses: combination of ultrastructural (TEM) and immunofluorescence data with CBF measurements provides convincing correlation between structural defects and impaired ciliary function. 

      (4) Mutual dependency model: reciprocal destabilization of CCDC108, MYCBPAP, and CFAP70 supports their interdependence in the C2a assembly. 

      (5) Expansion of the vertebrate C2a proteome: the identification of ARMC3 and MYCBP as C2a-associated proteins provides a foundation for future mechanistic studies. 

      We appreciate our reviewer's positive comments.

      Weaknesses:

      (1) Mechanistic depth: the data show a convincing correlation between C2a and ciliary function, but the cell type-specificity of CCDC108, MYCBPAP, and CFAP70 knockout effects is underdeveloped. This is an interesting observation that raises mechanistic/structural questions not addressed in the study, such as what is the role of C2a in CP nucleation, maintenance, or mechanical stabilization? Is C2a composition different in different cell types? 

      We agree with our reviewer and value their insightful comments. Indeed, CP-MT defects, including the loss of one or both CP-MTs, were only observed in a subset of mouse ependymal cells (mEPCs) at day 10 post-serum starvation, and were rare in tracheal multiciliated cells, although the C2a projections were severely damaged in these tracheal cells. Based on these observations, we hypothesize that the loss of CP-MTs is probably a secondary effect caused by mechanical stress during ciliary movement. To investigate the role of C2a in CP-MT nucleation, maintenance, or mechanical stabilization, we plan to examine the axoneme structures of mEPCs at day 5 post-serum starvation using TEM. By comparing axoneme defects in these cells at days 5 and 10, we hope to gain insights into this question. Based on our findings and previous findings in Chlamydomonas, we speculate that the core components (CCDC108/FAP65, MYCBPAP/FAP147, and CFAP70/FAP70) of the C2a projection are highly conserved across species, but the peripheral associated C2a proteins may vary among different cell types. Therefore, we will perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (2) Cell model choice: co-immunoprecipitation was performed using mouse testis lysates. While this is a reasonable source of CA proteins from flagellated cells, the functional analyses in this study focus on ependymal and tracheal multiciliated cells. It would therefore be helpful for the authors to clarify the extent to which these interactions are expected to be conserved across ciliated cell types, and to discuss potential tissue-specific differences in CA assembly.

      We appreciate our reviewer's insightful comments. We will follow their suggestion and perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (3) Statistical analysis: the manuscript states "Statistical significance was defined as P < 0.5", which is likely a typo, but should be P < 0.05. In general, the statistical methods require more clarification. In several figures (e.g., 2B, 2D, 5J, 5K), multiple knockout genotypes are compared with WT, yet unpaired t-tests are reported. When more than two groups are analyzed, multiple pairwise t-tests inflate Type I error unless appropriately corrected; a one-way ANOVA with post hoc comparisons (e.g., Dunnett's test for WT-referenced comparisons) would be more appropriate. Furthermore, the analysis of ciliary movement modes (Figure 2D) involves categorical data, for which a t-test is not statistically appropriate. These comparisons could instead be evaluated using chi-square or Fisher's exact tests. Addressing these issues is important to ensure accurate statistical inference.

      We thank our reviewer for pointing out these errors. We will double-check our statistical results and perform new analyses following their suggestion.

      (4) Methods section: does not sufficiently describe how image-based quantifications were performed. For example, the criteria used to define cilia number, basal body number, and rotational beating are not specified, nor is how CBF measurements were analyzed. The authors should also provide details regarding analysis software and imaging parameters used (and whether they were kept constant across genotypes). 

      We apologize for overlooking these method details. We will expand the relevant method section to include this information.

    1. eLife Assessment

      This important work identifies phlda2 as a specific marker for primordial cardiomyocytes in the adult zebrafish heart and demonstrates their essential role in myocardial morphogenesis and coronary vascularization, but not in heart regeneration. The conclusions are well supported by single-cell transcriptomics, new genetic tools, and cell-specific ablation experiments. Overall, the evidence is solid and provides insight into the difference between developmental and regenerative cardiac programs. This work will be of interest for those studying cardiac development and regeneration.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question in cardiac biology: whether distinct cardiomyocyte (CM) subpopulations play specialized roles during heart development and regeneration. Using single-cell RNA sequencing and newly generated genetic tools, the authors identify phlda2 as a specific marker of primordial cardiomyocytes in the adult zebrafish heart. They further show that these primordial CMs function are essential for myocardial morphogenesis and coronary vascularization but are dispensable for myocardial regeneration or revascularization after injury. These findings indicate that heart regeneration doesn't simply recapitulate developmental processes.

      Strengths:

      A major strength of the study is the generation of a phlda2 BAC reporter, which provides a specific and reliable marker for primordial cardiomyocytes. The lack of genetic tools has previously limited functional analysis of this CM population. By using phlda2 regulatory elements to generate reporter and NTR-based ablation lines, the authors can visualize and selectively manipulate primordial CMs in vivo. This enables a direct functional interrogation rather than relying on lineage tracing or correlative evidence. Through genetic ablation, the authors convincingly demonstrate that primordial CMs are essential for myocardial morphogenesis and coronary vascular organization during development but are not necessary for heart regeneration.

      Weaknesses:

      (1) The manuscript would benefit from clarifying whether the primordial cardiomyocytes ablation affects epicardial cell behaviors during heart development, given that the well-established role of the epicardium in supporting coronary vessel growth, it is possible that the vascular phenotypes observed after primordial CM ablation may be affected, at least in part, by altered epicardial cells.

      (2) Because primordial cardiomyocytes form a dense, single-cell-thick layer covering the ventricular surface, it would be informative to determine whether their loss alters the spatial distribution or inward migration of coronary endothelial cells or epicardial cells.

      (3) The manuscript carefully examines the relationship between primordial CMs and gata4⁺ cardiomyocytes during regeneration. However, their relationship during heart development should be more fully addressed.

      (4) As loss of cardiomyocytes is known to induce gata4:GFP activation during regeneration, it would be important to determine whether ablation of primordial cardiomyocytes alone triggers gata4:GFP expression in neighboring cardiomyocytes. This analysis would further support the conclusion that primordial cardiomyocytes are not required for regenerative responses.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript "Primordial Cardiomyocytes orchestrate myocardial morphogenesis and vascularization but are dispensable for regeneration", Sun et al. identify a novel marker of primordial cardiomyocytes and use it to visualize and ablate the population during development and regeneration. The role of the primordial layer has not been investigated because the tools to manipulate this population have not existed. The manuscript is straightforward, easy to understand, and addresses an important question that has not been explored.

      While the manuscript provides important insights into the role of primordial CMs, backed by a convincing methodology, the authors should clarify their requirements for heart development and maturation. Specifically, is the primordial layer required for the fish to survive? Do primordial CMs regenerate when ablated during development, and do the defects observed (in trabecular and compact CMs and coronary vessels) resolve after 10 days post-treatment when they were detected?

      Strengths:

      The major strengths are the identification of a marker that enables manipulation of primordial cardiomyocytes and the tools generated by the team.

      Weaknesses:

      The major weakness is not considering the longer-term consequences of primordial layer ablation during development, as it is unclear whether the animals succumb to the acute cardiac defects observed or fully recover.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed single-cell RNA sequencing of adult zebrafish hearts and identified markers for distinct cardiomyocyte subpopulations. One marker, phlda2, marks primordial cardiomyocytes. They generated transgenic reporter lines to characterize phlda2 expression patterns and a phlda2-NTR ablation line to determine the functional requirement of primordial cardiomyocytes during heart regeneration. They found that phlda2+ primordial cardiomyocytes are essential for myocardial morphogenesis and coronary vessel development. Interestingly, when phlda2+ primordial cardiomyocytes are ablated during heart regeneration, gata4+ cortical cardiomyocytes, coronary vessel revascularization, and scar tissue formation are not affected.

      Strengths:

      The authors identified a new primordial cardiomyocyte marker, phlda2. They further demonstrated that primordial cardiomyocytes are important for heart morphogenesis but dispensable for heart regeneration. Their findings reveal a potential difference between heart development and regeneration programs.

      Weakness:

      Despite the interesting findings, the authors did not provide supplemental data for their scRNAseq to demonstrate the data quality and support their conclusions, and some results are not well described.

    5. Author response:

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential interest to researchers working on cardiac development and regeneration. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question in cardiac biology: whether distinct cardiomyocyte (CM) subpopulations play specialized roles during heart development and regeneration. Using single-cell RNA sequencing and newly generated genetic tools, the authors identify phlda2 as a specific marker of primordial cardiomyocytes in the adult zebrafish heart. They further show that these primordial CMs function are essential for myocardial morphogenesis and coronary vascularization but are dispensable for myocardial regeneration or revascularization after injury. These findings indicate that heart regeneration doesn't simply recapitulate developmental processes.

      Strengths:

      A major strength of the study is the generation of a phlda2 BAC reporter, which provides a specific and reliable marker for primordial cardiomyocytes. The lack of genetic tools has previously limited functional analysis of this CM population. By using phlda2 regulatory elements to generate reporter and NTR-based ablation lines, the authors can visualize and selectively manipulate primordial CMs in vivo. This enables a direct functional interrogation rather than relying on lineage tracing or correlative evidence. Through genetic ablation, the authors convincingly demonstrate that primordial CMs are essential for myocardial morphogenesis and coronary vascular organization during development but are not necessary for heart regeneration.

      Weaknesses:

      (1) The manuscript would benefit from clarifying whether the primordial cardiomyocytes ablation affects epicardial cell behaviors during heart development, given that the well-established role of the epicardium in supporting coronary vessel growth, it is possible that the vascular phenotypes observed after primordial CM ablation may be affected, at least in part, by altered epicardial cells.

      We thank the reviewer for this thoughtful comment and agree that primordial cardiomyocyte ablation may indirectly affect coronary vessel growth through changes in epicardial cell behavior. Therefore, we will perform additional analyses to examine epicardial cell behaviors, including epicardial coverage and migration following primordial cardiomyocyte ablation using the established epicardial reporter line tcf21:nucEGFP during heart development.

      (2) Because primordial cardiomyocytes form a dense, single-cell-thick layer covering the ventricular surface, it would be informative to determine whether their loss alters the spatial distribution or inward migration of coronary endothelial cells or epicardial cells.

      We thank the reviewer for this important comment. We will analyze the spatial distribution and inward migration of coronary endothelial and epicardial cells after primordial cardiomyocyte ablation using high-resolution imaging and quantitative analysis

      (3) The manuscript carefully examines the relationship between primordial CMs and gata4⁺ cardiomyocytes during regeneration. However, their relationship during heart development should be more fully addressed.

      We appreciate the suggestion and will carefully investigate the relationship between primordial cardiomyocytes and gata4<sup>+</sup> cardiomyocytes during heart development.

      (4) As loss of cardiomyocytes is known to induce gata4:GFP activation during regeneration, it would be important to determine whether ablation of primordial cardiomyocytes alone triggers gata4:GFP expression in neighboring cardiomyocytes. This analysis would further support the conclusion that primordial cardiomyocytes are not required for regenerative responses.

      We acknowledge the reviewer’s comments and will test whether primordial cardiomyocyte ablation induces gata4:GFP activation in neighboring cardiomyocytes in the adult heart.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Primordial Cardiomyocytes orchestrate myocardial morphogenesis and vascularization but are dispensable for regeneration", Sun et al. identify a novel marker of primordial cardiomyocytes and use it to visualize and ablate the population during development and regeneration. The role of the primordial layer has not been investigated because the tools to manipulate this population have not existed. The manuscript is straightforward, easy to understand, and addresses an important question that has not been explored.

      While the manuscript provides important insights into the role of primordial CMs, backed by a convincing methodology, the authors should clarify their requirements for heart development and maturation. Specifically, is the primordial layer required for the fish to survive?

      We thank the reviewer for this important question. We will examine the survival of fish following primordial cardiomyocyte ablation during development.

      Do primordial CMs regenerate when ablated during development, and do the defects observed (in trabecular and compact CMs and coronary vessels) resolve after 10 days post-treatment when they were detected?

      We thank the reviewer for this valuable comment. We will perform additional analyses to determine whether primordial cardiomyocytes regenerate after ablation during development and to assess the extent and dynamics of their recovery. We will also evaluate whether the defects in trabecular and compact myocardium and coronary vasculature persist or resolve in adult hearts following primordial cardiomyocyte ablation during development.

      Reviewer #3 (Public review):

      Summary:

      The authors performed single-cell RNA sequencing of adult zebrafish hearts and identified markers for distinct cardiomyocyte subpopulations. One marker, phlda2, marks primordial cardiomyocytes. They generated transgenic reporter lines to characterize phlda2 expression patterns and a phlda2-NTR ablation line to determine the functional requirement of primordial cardiomyocytes during heart regeneration. They found that phlda2+ primordial cardiomyocytes are essential for myocardial morphogenesis and coronary vessel development. Interestingly, when phlda2+ primordial cardiomyocytes are ablated during heart regeneration, gata4+ cortical cardiomyocytes, coronary vessel revascularization, and scar tissue formation are not affected.

      Strengths:

      The authors identified a new primordial cardiomyocyte marker, phlda2. They further demonstrated that primordial cardiomyocytes are important for heart morphogenesis but dispensable for heart regeneration. Their findings reveal a potential difference between heart development and regeneration programs.

      Weakness:

      Despite the interesting findings, the authors did not provide supplemental data for their scRNAseq to demonstrate the data quality and support their conclusions, and some results are not well described.

      We appreciate the reviewer’s comment. We will include supplemental data to demonstrate the quality of our single-cell RNA sequencing. Additionally, we will provide more detailed descriptions of the key results in the main text and figure legends to clearly support our conclusions regarding primordial cardiomyocytes and their roles in heart morphogenesis and regeneration.

    1. eLife Assessment

      This study presents a useful methodological advance that better enables the simultaneous measurement of gene expression and chromatin accessibility in individual cells. The evidence supporting the improved detection of gene expression is solid, though the reduced performance in detecting chromatin accessibility represents a limitation. This method will be of interest to those studying transcription and gene regulation.

    2. Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

    3. Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate.

    4. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

      We thank reviewer #1 for his comments and critique. We will include a direct comparison of easySHARE-seq with the 10x Multiome platform by adding this comparison to Fig. 1 E&F and more directly point to Table 1 as a comparison of overall assay possibilities. We will also more explicitly state and describe the possibilities and limitations of how to scale this assay up. We also thank the reviewer for raising the possible issue of ambient RNA contamination. We aim to quantify ambient RNA contamination and explore its impact as well as possibilities to correct for it if needed. Unfortunately, external circumstances make it difficult to perform further wetlab experiments in order to optimize ATAC-seq performance. We will thus update our discussion to include possibilities on how to improve ATAC-seq data quality.

      Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

      We thank reviewer #2 for his comments and great suggestions for further analyses. We will emphasize ATAC-seq data quality issues further in our discussions and more explicitly discuss the resulting implications and shortcomings. We agree with reviewer #2 that this dataset allows exploration of enhancer logic. We aim to incorporate the suggested analyses regarding RNA-ATAC correlations, expand our exploration of enhancer biology and include these results in our revisions. We will also improve clarity of our zonation analysis procedure.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate.

    1. eLife Assessment

      In this valuable study, Robben et al. describe a 3D beta-cell spheroid platform, a tool allowing high-throughput monitoring of cytoplasmic calcium concentrations and insulin secretion, with calcium signals comparable to those recorded in primary pancreatic islets. The authors demonstrate the method by culturing MIN6 cells in a 3D culture system, and show solid evidence of its utility by recording calcium signals in a high-throughput format and characterizing these calcium signals using pharmacological tools. This highlights the potential utility of the 3D beta-cell spheroids for screening new pharmacological modulators of pancreatic beta-cell function.