1. Mar 2026
    1. Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost. They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Future versions of the work can consider extending the ideas to additional datasets, species, definitions of fitness, or even different proteins entirely.

      Comments on revisions:

      We thank the authors for addressing our points and have no remaining questions.

    2. Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with an ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve the breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows disentangling these processes, through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acids substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument. In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average but not those which improve binding to specific epitopes.

      Comments on revisions:

      We thank the authors for clarifying the description of the methods and for adding additional discussion of important directions for future work.

    3. Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling and the advantage, in terms of mutational effects prediction as well as computational efficiency, is clearly demonstrated via comparisons to state-of-the-art models.

      Weaknesses:

      While all the main points are well addressed and supported, it could have been interesting to strengthen the claim of gain in interpretability by investigating it explicitly in relation to the functional effects studied in this paper.

      Comments on revisions:

      I thank the authors for clarifying a few points I had flagged up and I appreciate much better that the content of the companion paper was precisely covering model selection and structural interpretability.

      Regarding my first point (references for language models for antibodies), I feel that the parenthetical citation format shouldn't be a problem (but the editors might advise here). Antiberta2 is this paper: https://www.biorxiv.org/content/10.1101/2023.12.12.569610v1.full.pdf (yet, I understand if the authors want to focus on models purely sequence-based). A couple of additional references could be: https://academic.oup.com/bioinformatics/article/40/11/btae659/7888884; https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012646; https://www.pnas.org/doi/10.1073/pnas.2418918121; https://arxiv.org/abs/2506.13006.

      A very minor comment: could one add some p-value (it could be a supplementary table) for the Pearson correlation coefficients? The comparison between methods is rather clear, but for some correlations it's a bit unclear whether they should be considered significant. It would be important to understand the extent to which in different datasets one might expect functional prediction power based on an evolutionary objective function alone.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. /hyperpost/🌐/🧊/0/

      <<< /hyperpost.peergos.me/🧊/

      The above link is to a page hosted via.hypothes.is so that pasted in data;image.s are visible

      this is possible because hyperpost.peergos.me is avaialbe via the hypothes.is proxy

      This one is fortiutous and helped me to be able to create document using Peergos Custom App CK editor and own derivatives

      eventually the goal is to distribute applications via IPF/NS not being locked in using Peergos or Cryptpad or any reliance on hosted services

      Origo Folder - hyperpost web directory mirroring 1 on IPFS

    1. eLife Assessment

      This valuable study identifies a novel regulator of stress-induced gene quiescence in C. elegans: the multi-Zinc-finger protein ZNF-236. The work provides evidence for an active mechanism that maintains the repressed state of inducible genes under basal conditions in the absence of stress. The claims for discovery made in the title and abstract are supported by solid experimental data. However, a deeper investigation into the mechanisms of ZNF-236 action could substantially enhance the manuscript's impact and value.

    2. Reviewer #1 (Public review):

      Summary:

      The paper by ILBAY et al describes a screen in C. elegans for loss-of-function of factors that are presumed to constitutively downregulate heat shock or stress genes regulated by HSF-1. The hypothesis posits an active mechanism of downregulation of these genes under non-stressed conditions. The screen robustly identified ZNF-236, a multi zinc finger containing protein, whose loss upregulates heat-shock and stress-induced prion-like protein genes, but which does not appear to act in cis at the relevant promoters. The authors speculate that ZNF-236 acts indirectly on chromatin or chromatin domains to repress hs genes under non-stressed conditions.

      Strengths:

      The screen is clever, well-controlled and quite straightforward. I am convinced that ZNF-236 has something to do with keeping heat shock and other stress transcripts low. The mapping of potential binding sites of ZNF-236 is negative, despite the development of a new method to monitor binding sites. I am not sure whether this assay has a detection/sensitivity threshold limit, as it is not widely used. Up to this point, the data are solid, and the logic is easy to follow.

      Weaknesses:

      While the primary observations are well-documented, the mode of action of ZNF-236 is inadequately explored. Multi Zn finger proteins often bind RNA (TFIII3A is a classic example), and the following paper addresses multivalent functions of Zn finger proteins in RNA stability and processing: Mol Cell 2024 Oct 3;84(19):3826-3842.e8. doi: 10.1016/j.molcel.2024.08.010.). I see no evidence that would point to a role for ZNF-236 in nuclear organization, yet this is the authors' favorite hypothesis. In my opinion, this proposed mechanism is poorly justified, and certainly should not be posited without first testing whether ZNF-236 acts post-transcriptionally, directly down-regulating the relevant mRNAs in some way. It could regulate RNA stability, splicing, export or translation of the relevant RNAs rather than their transcription rates. This can be tested by monitoring whether ZNF-236 alters run-on transcription rates or not. If nascent RNA synthesis rates are not altered, but rather co- and/or post-transcriptional events, and if ZNF-236 is shown to bind RNA (which is likely), the paper could still postulate that the protein plays a role in downregulating stress and heat shock proteins. However, they could rule out that it acts on the promoter by altering RNA Pol II engagement. Another option that should be tested is that ZNF-236 acts by nucleating an H3K9me domain that might shift the affected genes to the nuclear envelope, sequestering them in a zone of low-level transcription. That is also easily tested by tracking the position of an affected gene in the presence and absence of SNF-236. This latter mechanism is also right in line with known modes of action for Zn finger proteins (in mammals, acting through KAP1 and SETDB1). A role for nucleating H3K9me could be easily tested in worms by screening MET-2 or SET-25 knockouts for heat shock or stress mRNA levels. These data sets are already published.

      Without testing these two obvious pathways of action (through RNA or through H3K9me deposition), this paper is too preliminary.

      Appraisal:

      The authors achieved their initial aim with the screen, and the paper is of interest to the field. However, they do not adequately address the likely modes of action. Indeed, I think their results fail to support the conclusion or speculation that ZNF-236 acts on long-range chromatin organization. No solid evidence is presented to support this claim.

      Impact:

      If the paper were to address and/or rule out likely modes of action, the paper would be of major value to the field of heat shock and stress mRNA control.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the identification of ZNF-236 as a key regulator that maintains quiescence of heat shock inducible genes in C. elegans. Using a forward genetic screen for constitutive activation of an endogenous hsp-16.41 reporter, the authors show that loss of znf-236 leads to widespread, HSF-1-dependent expression of inducible heat shock proteins (iHSPs) and a subset of prion-like stress-responsive genes, in the absence of proteotoxic stress. Transcriptomic analysis reveals that znf-236 mutants partially overlap with the canonical heat shock response, selectively activating highly inducible iHSPs rather than the full HSR program. iHSP transgenes integrated throughout the genome generally become de-repressed in znf-236 mutants, whereas the same constructs on extrachromosomal arrays or inserted into the rDNA locus re insensitive to znf-236 loss. Using a newly developed method, Transcription Factor Deaminase Sequencing (TFD-seq), the authors show that ZNF-236 binds sparsely across the genome and does not associate with iHSP promoters, supporting an indirect mode of regulation. Physiologically, znf-236 mutants exhibit increased thermotolerance and maintain iHSP expression during aging.

      Strengths:

      This is a carefully executed and internally consistent study that identifies a new regulator of stress-induced gene quiescence in C. elegans. The genetics are clean and the phenotypes are robust.

      Weaknesses:

      The manuscript is largely descriptive. It would be substantially strengthened by deeper mechanistic insight into what ZNF-236 does beyond being required for default silencing.

    4. Reviewer #3 (Public review):

      Summary:

      The researchers performed a genetic screen to identify a protein, ZNF-236, which belongs to the zinc finger family, and is required for repression of heat shock inducible genes. The researchers applied a new method to map the binding sites of ZNF-236, and based on the data, suggested that the protein does not repress genes by directly binding to their regulatory regions targeted by HSF1. Insertion of a reporter in multiple genomic regions indicates that repression is not needed in repetitive genomic contexts. Together, this work identifies ZNF-236, a protein that is important to repress heat-shock-responsive genes in the absence of heat shock.

      Strengths:

      A hit from a productive genetic screen was validated, and followed up by a series of well-designed experiments to characterize how the repression occurs. The evidence that the identified protein is required for the repression of heat shock response genes is strong.

      Weaknesses:

      The researchers propose and discuss one model of repression based on protein binding data, which depends on a new technique and data that are not fully characterized.

      Major Comments:

      (1) The phrase "results from a shift in genome organization" in the abstract lacks strong evidence. This interpretation heavily relies on the protein binding technique, using ELT-2 as a positive and an imperfect negative control. If we assume that the binding is a red herring, the interpretation would require some other indirect regulation mechanism. Is it possible that ZNF-236 binds to the RNA of a protein that is required to limit HSF-1 and potentially other transcription factors' activation function? In the extrachromosomal array/rDNA context, perhaps other repressive mechanisms are redundant, and thus active repression by ZNF-236 is not required. This possibility is mentioned in one sentence in the discussion, but most of the other interpretations rely on the ZNF-236 binding data to be correct. Given that there is other evidence for a transcriptional role for ZNF-236, and no negative control (e.g. deletion of the zinc fingers, or a control akin to those done for ChIP-seq (like a null mutant or knockdown), a stronger foundation is needed for the presented model for genome organization.

      (2) Continuing along the same line, the study assumes that ZNF-236 function is transcriptional. Is it possible to tag a protein and look at localization? If it is in the nucleus, it could be additional evidence that this is true.

      (3) I suggest that the authors analyze the genomic data further. A MEME analysis for ZNF-236 can be done to test if the motif occurrences are enriched at the binding sites. Binding site locations in the genome with respect to genes (exon, intron, promoter, enhancer?) can be analyzed and compared to existing data, such as ATAC-seq. The authors also propose that this protein could be similar to CTCF. There are numerous high-quality and high-resolution Hi-C data in C. elegans larvae, and so the authors can readily compare their binding peak locations to the insulation scores to test their hypothesis.

      (4) The researchers suggest that ZNF-236 is important for some genomic context. Based on the transcriptomic data, can they find a clue for what that context may be? Are the ZNF-236 repressed genes enriched for not expressed genes in regions surrounded by highly expressed genes?

    5. Author response:

      Updated Response, March 3, 2026

      In the midst of considering the thoughtful and insightful reviews of our manuscript and updating our work accordingly, we wanted to provide an interim update.

      In the reviews of our paper, each of the reviewers brought up questions about the specificity and sensitivity of a new "TFD-Seq" assay for protein-DNA specificity in vivo that we had developed for this work and applied here for the first time with a complex eukaryote (Figure 4). While we remain strong proponents of developing in vivo assays for protein-DNA interaction, we took to heart the concerns that the reviewers had expressed. We have therefore, in the past few weeks, done a rather "deep dive" into both the technical aspects of the TFD-Seq data and the conceptual and statistical aspects of how TFD mutation data can be interpreted. From this analysis, we find ourselves in agreement with the concerns. In particular, our "deep dive" has suggested that conclusions from TFD data (particularly negative conclusions on the presence of binding sites) will require a better understanding of signal and noise in the kind assay used in Figure 4.

      As the work is current in the submitted/preprint stage, we look forward to spending some time working (as appropriate) on both improvements to current protocols and alternative experiments to support the novel assay. An updated preprint which (for now) conveys the body of work and conclusions (which are not substantially altered), while avoiding the complexities of the TFD-seq assay is available at BioRXIV, and we will look forward to sending a version-of-record over the next few months as we have had a chance to provide robust tests for the macromolecular targets/interactors for ZNF-236 factor that was identified in this study.

      We again thank the reviewers (peer review is indeed really a good thing) and look forward to updating everyone soon.

      Updated bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2025.10.22.683740v3

      Original Response, January 5, 2026

      We thank the reviewers for their insights and suggestions. We appreciate that the reviewers were engaged by both the observations and their interpretation, and consider their interest in further analysis and clarified discussion to be the best possible compliment to this work.

      As noted by the reviewers, the working hypothesis of a nuclear organization role for ZNF-236 is just one model. Clarifying this model and potential alternatives will certainly add to the manuscript and this will be a key part of the revision.  Beyond this, several suggested analyses should explore extant models, while providing context for considering alternatives.  We look forward to carrying out such analyses as feasible and will report them in the revised manuscript.

    1. chocolate milk on tap
      1. Cookies and cream ice cream
      2. Cadbury's Marble chocolate bar
      3. Strawberry milkshake
      4. Fanmade Animals Of Farthing Wood characters
      5. Juicy gossip at the hairdresser's
      6. My Little Pony having autistic ponies
      7. Remembering the first season of Scream Street
      8. WatchMojo's 10 Childhood Shows That Feel Like A Fever Dream Video
      9. WatchMojo's 10 Annoying Kids' Shows Video
      10. Doing a scrapbook
    1. Prince Escalus. Come, Montague; for thou art early up, To see thy son and heir more early down. Montague. Alas, my liege, my wife is dead to-night; Grief of my son's exile hath stopp'd her breath: 3185What further woe conspires against mine age? Prince Escalus. Look, and thou shalt see. Montague. O thou untaught! what manners is in this? To press before thy father to a grave? Prince Escalus. Seal up the mouth of outrage for a while, 3190Till we can clear these ambiguities, And know their spring, their head, their true descent; And then will I be general of your woes, And lead you even to death: meantime forbear, 3195And let mischance be slave to patience. Bring forth the parties of suspicion. Friar Laurence. I am the greatest, able to do least, Yet most suspected, as the time and place Doth make against me of this direful murder; 3200And here I stand, both to impeach and purge Myself condemned and myself excused. Prince Escalus. Then say at once what thou dost know in this. Friar Laurence. I will be brief, for my short date of breath Is not so long as is a tedious tale. 3205Romeo, there dead, was husband to that Juliet; And she, there dead, that Romeo's faithful wife: I married them; and their stol'n marriage-day Was Tybalt's dooms-day, whose untimely death Banish'd the new-made bridegroom from the city, 3210For whom, and not for Tybalt, Juliet pined. You, to remove that siege of grief from her, Betroth'd and would have married her perforce To County Paris: then comes she to me, And, with wild looks, bid me devise some mean 3215To rid her from this second marriage, Or in my cell there would she kill herself. Then gave I her, so tutor'd by my art, A sleeping potion; which so took effect As I intended, for it wrought on her 3220The form of death: meantime I writ to Romeo, That he should hither come as this dire night, To help to take her from her borrow'd grave, Being the time the potion's force should cease. But he which bore my letter, Friar John, 3225Was stay'd by accident, and yesternight Return'd my letter back. Then all alone At the prefixed hour of her waking, Came I to take her from her kindred's vault; Meaning to keep her closely at my cell, 3230Till I conveniently could send to Romeo: But when I came, some minute ere the time Of her awaking, here untimely lay The noble Paris and true Romeo dead. She wakes; and I entreated her come forth, 3235And bear this work of heaven with patience: But then a noise did scare me from the tomb; And she, too desperate, would not go with me, But, as it seems, did violence on herself. All this I know; and to the marriage 3240Her nurse is privy: and, if aught in this Miscarried by my fault, let my old life Be sacrificed, some hour before his time, Unto the rigour of severest law. Prince Escalus. We still have known thee for a holy man. 3245Where's Romeo's man? what can he say in this? Balthasar. I brought my master news of Juliet's death; And then in post he came from Mantua To this same place, to this same monument. This letter he early bid me give his father, 3250And threatened me with death, going in the vault, I departed not and left him there. Prince Escalus. Give me the letter; I will look on it. Where is the county's page, that raised the watch? Sirrah, what made your master in this place? 3255 Page. He came with flowers to strew his lady's grave; And bid me stand aloof, and so I did: Anon comes one with light to ope the tomb; And by and by my master drew on him; And then I ran away to call the watch. 3260 Prince Escalus. This letter doth make good the friar's words, Their course of love, the tidings of her death: And here he writes that he did buy a poison Of a poor 'pothecary, and therewithal Came to this vault to die, and lie with Juliet. 3265Where be these enemies? Capulet! Montague! See, what a scourge is laid upon your hate, That heaven finds means to kill your joys with love. And I for winking at your discords too Have lost a brace of kinsmen: all are punish'd. 3270 Capulet. O brother Montague, give me thy hand: This is my daughter's jointure, for no more Can I demand. Montague. But I can give thee more: For I will raise her statue in pure gold; 3275That while Verona by that name is known, There shall no figure at such rate be set As that of true and faithful Juliet. Capulet. As rich shall Romeo's by his lady's lie; Poor sacrifices of our enmity! 3280 Prince Escalus. A glooming peace this morning with it brings; The sun, for sorrow, will not show his head: Go hence, to have more talk of these sad things; Some shall be pardon'd, and some punished: For never was a story of more woe 3285Than this of Juliet and her Romeo. [Exeunt]

      the prince enter and ask what is going on friar Laurence explains the entire plan saying romeo and juliet were secretly married hearing this the montaque and capulet decides to build statues honoring them and setting the beef aside

    1. which is the type channel.

      this could depend on settings. maybe for some settings the evidence mapping is very easy, but not a simple mapping from types. so make this less strong statements

    Annotators

    1. eLife Assessment

      This important work by Qin et al. delineates layered neuropeptidergic mechanisms that regulate sugar intake in a hunger state-dependent manner. Using a combination of genetic, physiological, and behavioral experiments, the authors convincingly show that Hugin- and Allatostatin A-releasing neurons are selectively active in sated flies and suppress sugar feeding by reducing the sensitivity of Gr5a-expressing gustatory neurons. They further demonstrate that Neuromedin U neurons share key physiological properties with fly Hugin neurons, highlighting conserved peptide functions across animal phyla.

    2. Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

    3. Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism. Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field.

      We thank the reviewer for the thoughtful comment. We agree that the behavioral effects of manipulating the Hugin–AstA system alone were considerably weaker than the pronounced PER shifts induced by starvation. We have revised our Discussion to address it by positioning our findings within the broader context of energy regulation.

      More specifically, we discuss that feeding behavior is controlled by two distinct, yet synergistic, types of mechanisms:

      (1) Hunger-driven 'accelerators': as the reviewer notes, pathways involving dopamine and NPF are powerful drivers of sweet sensitivity. These systems are strongly activated by hunger to promote food-seeking and consumption.

      (2) Satiety-driven 'brakes': our study identifies the counterpart to those systems above, aka. a satiety-driven 'brake'. The Hugin–AstA pathway acts as a direct sensor of high internal energy (glucose), which is specifically engaged during satiety to actively suppress sweet sensation and prevent overconsumption.

      This framework explains the seemingly discrepancy in effect size. The dramatic PER shift seen upon starvation is a combined result of engaging the 'accelerators' (hunger pathways like TH/NPF) while simultaneously releasing the 'brake' (our Hugin–AstA pathway being inactive).

      Our manipulations, which specifically target only the 'brake' system, are therefore expected to have a more modest effect than this combined physiological state. Thus, rather than being a "minor part," the Hugin–AstA pathway is a mechanistically defined, satiety-specific circuit that is essential for the precise "braking" required for energy homeostasis. We will update our Discussion to emphasize how these 'accelerator' and 'brake' circuits must work in concert to ensure precise energy regulation.

      In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

      We thank the reviewer for highlighting the critical issue of subtype-level specialization within the Hugin and AstA populations.

      We fully agree that the Hugin system is known for its functional heterogeneity (pleiotropy), with different Hugin neuron subclusters implicated in regulating a variety of behaviors, including feeding, aversion, and locomotion (e.g., Anna N King, Curr Biol, 2017, Andreas PLoS Biol, Sebastian et al., 2016, Nat Comm). Our finding that only a specific subcluster of Hugin neurons is responsive to glucose elevation provides a crucial first step in functionally dissecting this complexity.

      we have added a dedicated paragraph to elaborate on this functional partitioning in the discussion. We propose that this subtype-level specialization allows the Hugin system to precisely link specific physiological states (like high circulating glucose) to appropriate behavioral outputs (like the suppression of sweet taste), demonstrating an elegant solution to coordinating multiple survival behaviors. Future work using high-resolution tools such as split-GAL4 and single-cell sequencing will be invaluable in fully mapping the specific functional roles corresponding to each Hugin and AstA subcluster.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

      We thank the reviewer for this insightful comment and agree that the modest behavioral effect of Glut1 knockdown is a critical finding that warrants further clarification. This observation strongly supports the idea that internal energy state is monitored by a sophisticated and robust network, not a single, fragile component. We believe the effect size is modest for two main reasons, which we have addressed in revised Discussion.

      Firstly, the effect size is likely attenuated by technical and molecular redundancy. Specifically, the RNAi-mediated knockdown of Glut1 may be incomplete, leaving residual transporter function. Furthermore, Glut1 is likely only one part of the Hugin neuron's intrinsic sensing mechanism; other components, such as alternative glucose transporters or downstream K<sub>ATP</sub> channel signaling, may provide molecular redundancy, meaning that the full energy-sensing function is not easily abolished by a single manipulation.

      Secondly, and more importantly, the final feeding decision is an integrated output of competing circuits. While hunger-sensing pathways like the dopamine and NPF circuits act as powerful "accelerators" to drive sweet consumption, the Hugin–AstA pathway serves as a satiety-specific "brake." The modest effect of partially inhibiting just one component of this 'brake' system is the hallmark of a precisely regulated, multi-layered homeostatic system. We have clarified in the Discussion that the Hugin pathway represents one essential inhibitory circuit within this cooperative network that works together with the hunger-promoting systems to ensure precise control over energy intake.

      Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths:

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

      The reviewer is correct that several energy-sensing pathways are known. However, we now clarify that these previously established mechanisms, such as the dopaminergic and NPF pathways, primarily function as hunger-driven "accelerators." They are activated by low-energy states to promote sweet sensitivity and drive consumption.

      The crucial, missing piece of the puzzle—which our study provides—is the satiety-specific "brake" mechanism. We identify the Hugin–AstA circuit as one of the “brakes”: a dedicated, central sensor that responds directly to high circulating glucose (satiety) to suppress sweet sensation and prevent overconsumption.

      Thus, our work is unique because it defines the essential counterpart to the hunger pathways. In the revised Discussion, we have explained how these 'accelerator' (hunger) and 'brake' (satiety) systems work in concert to allow for the precise, bidirectional regulation of energy intake. Furthermore, by demonstrating that this Hugin/NMU 'brake' circuit is evolutionarily conserved in mice, our findings reveal a fundamental energy-sensing strategy and suggest that this pathway could represent a promising new therapeutic target for managing conditions of excessive food intake.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Considering the comments from all three reviewers, new experiments are not necessary, but the authors are welcome to provide new pieces of evidence that would strengthen their conclusions. To assist the authors with their revisions, the comments have been categorized from the highest to lowest priority based on the concerns raised by reviewers 1, 2, and 3.

      High priority:

      (1) Acknowledgement of partial phenotypes by the genetic manipulations, especially relative to other neuromodulators that are involved in the adjustment of sugar sensitivity after starvation (1, 2).

      Please see our responses to the Public Review 1 for details.

      (2) Detailed discussion on the novelty of the present work, also in light of previous studies both in flies and mammals (known Drosophila modulators, as well as NMU-rNST circuit on sugar sensation) (1, 2, 3).

      Please see our responses to the Public Review 3 for details.

      (3) Medium priority:

      • Discussions on the subtype-specific function of hugin neurons (1).

      Please see our responses to the Public Review 1 for details.

      • Discussions on the pleiotropic effect of changes in the level of circulating sugar (including release of other sugar types) (2, 3).

      We agree that circulating sugars represent a complex, systemic signal with broad, pleiotropic effects, and we have expanded our Discussion to address this.

      We will discuss the functional distinction between key hemolymph sugars, such as trehalose (the main circulating sugar, critical for stress/flight) and glucose (the primary, rapidly mobilized energy currency). While various sugars collectively influence metabolic status, our study’s unique focus is on the direct neural link between internal energy and sweet taste modulation. We clarify that our work precisely identifies glucose as the direct, key ligand for the Hugin satiety circuit, thus providing a concrete, mechanistically defined link from systemic energy complexity to the specific regulation of sweet sensation.

      • Illustration or clear explanations of sugar application methods in mouse experiments (ex. Figure 5F vs Figure 5M), as well as discussion on the concentration of sugar solutions used (3).

      We have added the relevant details in the figure legends and explain the rationale for using this concentration of sugar in the results.

      • Less saturated image for Figure 5K (3).

      We have adjusted Figure 5K to reduce image saturation for clarity.

      • Discussions on the modest effect of NMU on rNST neurons (Figure 5M) (3).

      In the revised results, we have discussed that the modest suppression of rNST activity likely reflects partial peptide diffusion and the heterogeneous composition of sweet-responsive rNST neurons.

      (4) Low priority:

      • Systematic quantification of multiple types of sugars after starvation (3).

      We agree that circulating sugars represent a complex metabolic milieu, and a fully systematic biochemical quantification of individual hemolymph sugars after starvation would be informative. While such analyses are beyond the scope of the present study, we have addressed this point at the functional level by systematically pre-feeding flies with different types of dietary sugars prior to PER assays.

      We find that multiple sugars are capable of suppressing PER, indicating that satiety-related behavioral inhibition is not unique to a single carbohydrate source. Notably, sucrose produces the strongest suppression, consistent with its rapid metabolic conversion and effectiveness in elevating internal glucose levels. These results support the notion that diverse dietary sugars converge on a common satiety-signaling mechanism, while our mechanistic analyses specifically identify glucose as the key ligand engaging the Hugin satiety circuit.

      We now clarify this distinction in the revised Discussion.

      • Testing Gr64f neurons or mutants (3).

      Our results indicate that energy sensing in the CNS suppresses sweet-sensing neuron activity (e.g., via hyperpolarization) rather than directly blocking sugar binding to receptors. Thus, sweet perception—not sugar detection—is inhibited. As evidence, in Figure supplementary4 we measured the PER to fructose and trehalose. Although Gr5a and Gr64a differ in their sensitivity to these sugars, the CNS energy state consistently suppresses sweet perception for both. As Reviewer 3 noted, Gr5a and Gr64f are co-expressed in sweet neurons; while they respond to different sugars, their labeling of the neurons is largely equivalent.

      • Testing sugar preference (glucose vs. other sugars) (3)

      Since our primary goal was to identify a direct satiety-sensing and sensory-modulating circuit—the "brake" mechanism—PER served as the most suitable and mechanistically specific readout. While manipulation of the Hugin–AstA circuit influences internal state, and therefore likely alters long-term sugar preference, investigating the integration of this pathway with reward and post-ingestive signaling is a critical question that lies beyond the scope of the current study.

      • Cell type-specific knockout of NMU (3).

      Achieving a cell type-specific knockout of NMU using the Cre approach is not feasible in the short term. While previous studies have reported the role of NMU in the VMH region in regulating feeding, our contribution lies in revealing how these neurons sense energy. We also show that these neurons project to the vicinity of Calb2 neurons and that the neuropeptide can suppress Calb2 neuronal activity. This essentially demonstrates that the hugin–Gr5a pathway in Drosophila is conserved in mice. We believe that a detailed dissection of the precise circuitry in mice is more appropriate to address in a subsequent study.

      • Explanation of NMU detection in Figure 5K (3): this is GFP expressed by the Cre-dependent virus.

      We have revised the Figure 5K legend to clarify that NMU<sup>+</sup> neurons are labeled by GFP expression from a Cre-dependent AAV2/1-DIO-GFP, which undergoes anterograde trans-synaptic transfer. We further explain that GFP expression in rNST neurons requires local AAV-Cre injection, enabling identification of postsynaptic Calb2<sup>+</sup> target neurons.

      • Neuronal manipulation of NMU neurons by optogenetics or DREADD.

      Please see our responses to the question “Cell type-specific knockout of NMU.”

      Reviewer #1 (Recommendations for the authors):

      A major concern about the study is that the effect of genetic manipulations on Hugin/AstA system appears to account for only a small part of the dramatic shift of PER probability toward smaller concentrations of sucrose solutions among starved flies. In Figure 1B and E, PER probability is significantly higher among starved flies in response to 10-200mM of sucrose solutions than fed flies. Compared to this, RNAi knockdown of glucose transporter in hugin neurons (Figure 2C), PK2-R1 pan-neuronally (Figure 3C) or in AstA-releasing neurons (Figure 3G), AstA-R1 in Gr5a neurons (Figure 4E), systemic mutation of PK-R2 (Figure Supplement 10) and AstA-R1 (Figure Supplement 12) all produce relatively minor behavioral changes. Consistent with previous works, the mutation of TH causes a robust decrease of PER across the entire range of sucrose concentration tested (Figure Supplement 1).

      These discrepancies can be caused by many technical limitations that cannot be readily addressed. For instance, the large effect of TH can be confounded by the pleiotropic behavioral effect of the lack of dopamine. RNAi can suffer from incomplete elimination of targeted genes. However, the relatively small behavioral effect size of these manipulations cannot be entirely ignored in light of previous publications, which point to the importance of other neuromodulators such as dopamine, serotonin, Akh, and NPF, on sugar sensitivity (Marella et al., 2012; Inagaki et al., 2014; Yao et al., 2022), as well as other potentially parallel glucose-sensing systems, including Gr43a-expressing cells (Miyamoto et al., 2012) and sNPF-expressing CN neurons (Oh et al., 2019). While the neuropeptides initially tested (Figure 1) are not poor choices, it is a missed opportunity that so many other neuromodulators were excluded from the initial search.

      We appreciate the reviewer’s detailed analysis and agree that the magnitude of behavioral effects produced by manipulating the hugin–AstA pathway is smaller than the dramatic shift in PER observed under starvation conditions. This comparison is important and highlights a central conceptual point of our study.

      Starvation represents a compound physiological state that simultaneously engages multiple hunger-promoting neuromodulatory systems—most prominently dopaminergic and NPF pathways—while also releasing satiety-associated inhibitory signals. As shown previously and confirmed here (Figure supplementary 1), manipulation of dopamine synthesis produces a broad and robust reduction in PER across sucrose concentrations, consistent with its role as a powerful hunger-driven modulator.

      By contrast, our genetic manipulations specifically target a satiety-associated inhibitory circuit—the hugin–AstA pathway—that is selectively engaged by high internal glucose levels. Manipulating this pathway alone therefore isolates a single “brake” component of feeding regulation, rather than recapitulating the full physiological state of starvation, which combines both accelerator activation and brake release. Accordingly, the more modest behavioral effects we observe are an expected consequence of dissecting one defined regulatory module from a larger, cooperative network.

      We agree that multiple neuromodulators, including dopamine, serotonin, Akh, NPF, and others, as well as parallel glucose-sensing systems such as Gr43a-expressing cells and sNPF-expressing CN neurons, contribute to the regulation of sugar sensitivity. Rather than aiming to exhaustively screen all neuromodulators, our study was designed to identify and mechanistically define a central, glucose-responsive satiety sensor that directly links internal energy state to sweet taste modulation. In the revised discussion, we now explicitly position the hugin–AstA circuit as one essential, satiety-specific component within this broader regulatory landscape and discuss how it functionally complements previously characterized hunger-driven pathways.

      I am also confused by the results of Shibirets1-mediated silencing of Hugin and AstA neurons (Figure Supplement 13B, C). It is unclear to me why a feeding assay was used instead of PER, like the activation experiments. Feeding (ingestion) and PER are qualitatively different types of behavior, which cannot be directly compared. Moreover, the definition of "fold change" is not provided either in the figure legend or in the Materials and Methods section, making it difficult to understand what the figure means.

      We thank the reviewer for pointing out this important issue regarding the interpretation of the Shibire^ts1-mediated silencing experiments. We agree that proboscis extension reflex (PER) and feeding/ingestion assays reflect qualitatively different behavioral processes and should not be directly compared.

      In the original submission, feeding assays were used to assess the effect of neuronal silencing, which led to ambiguity when comparing these results with PER-based activation experiments. To directly address this concern and ensure consistency across behavioral readouts, we have now performed additional PER experiments under the same Shibire^ts1-mediated silencing conditions.

      These new data demonstrate that acute silencing of hugin neurons significantly enhances PER responses to sucrose (Figure supplementary 13B), indicating increased sweet sensitivity. This result is fully consistent with our activation experiments and supports the conclusion that the hugin–AstA pathway suppresses sweet taste perception under satiety conditions.

      In addition, we have revised the figure legend to explicitly define the “fold change” metric used in the behavioral analysis, clarifying how the values were calculated and normalized. Together, these changes resolve the ambiguity raised by the reviewer and strengthen the behavioral consistency of our conclusions.

      Of note, Marella et al. (2012) reported that silencing of Hugin-releasing neurons did not affect PER. It is therefore possible that the Hugin system is sufficient, but not necessary, for modulating PER under food deprivation.

      We agree that their observation—that silencing Hugin-releasing neurons does not alter PER in starved flies—is consistent with a state-dependent role of the Hugin system in feeding regulation.

      In starved animals, dopaminergic TH<sup>+</sup> neurons are strongly activated and promote high PER responsiveness, while circulating glucose levels are low, placing Hugin neurons in a relatively inactive state. Under such conditions, further silencing of Hugin neurons would be expected to produce minimal additional effects on PER, which likely explains the results reported by Marella et al.

      Importantly, our data show that preventing the starvation-associated reduction in Hugin neuronal activity—by thermogenetic activation of Hugin<sup>+</sup> neurons (Hugin–TrpA1; Figure 1D)—significantly suppresses the hunger-induced enhancement of PER. These results indicate that dynamic downregulation of Hugin neuronal activity is a critical component of the normal behavioral shift in sweet sensitivity in response to food deprivation. Thus, while Hugin neurons may not be required to further modulate PER once animals are already in a strongly starved state, their regulated activity change is essential for mediating state-dependent modulation of sweet taste behavior. We have added discussion in the revised manuscript.

      While no new experiments are requested, it is important for authors to acknowledge the limited effect size of Hugin/AstA manipulation. In the current manuscript, the authors briefly mention the previous works (lines 460-462, 472-474), which is insufficient. Discussions must include how the Hugin/AstA system may "complement these established mechanisms (line 460)" (described in the references listed above), under what situations this novel Hugin/AstA system can be relevant for controlling PER, and why the fly is equipped with seemingly redundant systems for sensing internal glucose levels and controlling feeding behavior. Without these discussions, it is difficult to recognize the novelty of the presented work. The data appears largely to be a minor and incremental progress on an already mature field.

      In the revised manuscript, we have substantially expanded the Discussion to explicitly acknowledge this limited effect size and to clarify the functional role of the Hugin–AstA pathway within the broader energy-regulatory network. We now emphasize that this circuit represents a satiety-specific inhibitory branch that complements, rather than replaces, previously described hunger-promoting systems such as dopaminergic, NPF, and AKH circuits.

      Importantly, we discuss the specific physiological conditions under which the Hugin–AstA system is most relevant—namely, post-feeding and high-glucose states. Unlike hunger circuits that amplify sweet sensitivity during starvation, the Hugin–AstA pathway directly senses circulating glucose and rapidly suppresses sweet taste perception when energy is sufficient, thereby acting as a brake to prevent overconsumption.

      We further address the apparent redundancy among internal sugar-sensing systems. Rather than being redundant, these pathways form a coordinated and layered network with distinct sugar specificities, temporal dynamics, and functional roles. For example, Gr43a<sup>+</sup> neurons primarily detect fructose, whereas hemolymph glucose represents the principal energetic currency in Drosophila. The use of multiple internal sugar sensors allows flies to fine-tune feeding decisions across different nutritional contexts and timescales.

      Finally, we expand the Discussion to highlight that although the Hugin–AstA circuit constitutes only one branch of the energy-sensing network, its disruption leads to excessive energy intake (Figure supplementary 13C-E, G) and increased fat accumulation (Figure S13F), underscoring its physiological relevance. We also discuss how this pathway likely interacts with other neuromodulatory systems, including TH<sup>+</sup> dopaminergic and NPF<sup>+</sup> neurons, to collectively orchestrate adaptive feeding behavior and energy homeostasis.

      Together, these additions clarify that our work does not simply add another neuromodulator to an already mature field, but instead identifies a distinct glucose-sensing, satiety-linked mechanism that fills a conceptual gap between internal energy state detection and sensory modulation.

      Another perceived weakness is the lack of subtype-level dissection among Hugin- and AstA-releasing neurons. I make a justified request to narrow down the behaviorally relevant neuron to one (or one type), which is based on a widespread but unreasonable and dangerous assumption that every behavior must be controlled by one neuron. However, the authors present very interesting data that only a subset of Hugin- and AstA-releasing neurons responds to higher levels of sucrose (Figure 1H, Figure Supplement 7A, B), which leads to a hypothesis that a specific subtype within each peptidergic neuronal group is responsible for starvation-induced behavioral change. The authors only briefly touch upon this (lines 217-218), but this is an important hypothesis that requires further discussion.

      We thank the reviewer for highlighting the importance of neuronal heterogeneity within the Hugin- and AstA-releasing populations. We fully agree that the observation that only a subset of Hugin<sup>+</sup> and AstA<sup>+</sup> neurons responds to elevated sucrose levels (Figure 1H; Figure Supplement 7A, B) strongly suggests functional specialization within these peptidergic groups.

      In the revised Discussion, we now explicitly propose that distinct subtypes of Hugin and AstA neurons differentially contribute to energy sensing and feeding modulation. We suggest that glucose-responsive subpopulations may be specifically engaged in satiety signaling, whereas other neurons within the same genetic classes may participate in additional physiological or behavioral processes. This heterogeneity provides a plausible explanation for the partial behavioral effects observed following population-level manipulations. Although we did not perform subtype-specific perturbations in this study, our findings provide a foundation for identifying these subtypes in future work using split-GAL4 lines and connectomic datasets.

      These issues are more important than the sprawling and unfocused review of various hunger and satiety-controlling systems across species in the Introduction. Lines 53-108 contain only tangential information to the main conclusion of the paper. Both the Introduction and Discussion sections must be completely restructured so that readers understand what is already known about hunger-induced changes in feeding-related behavior, what is a missing gap of knowledge in neural mechanisms controlling behavioral adaptation under starvation, and why Hugin/NMU is an interesting target in this context.

      We thank the reviewer for this important structural critique. We agree that, in the original manuscript, the Introduction placed disproportionate emphasis on a broad survey of hunger- and satiety-regulating systems across species, which may have obscured the central conceptual advance of this study.

      In the revised manuscript, we have substantially restructured both the Introduction and the Discussion to sharpen the narrative focus and clarify the specific knowledge gap addressed by our work.

      First, the Introduction has been streamlined to focus on what is already known about hunger-induced modulation of feeding-related behaviors, particularly sweet taste sensitivity and PER in Drosophila. We now emphasize that prior studies have predominantly characterized hunger-activated, feeding-promoting pathways (e.g., dopaminergic, NPF, AKH systems) that act as accelerators of food-seeking behavior.

      Second, we explicitly define the missing gap in knowledge: while hunger-driven mechanisms are well studied, it remains unclear how satiety states—specifically elevated internal glucose levels—are directly sensed by central neurons and translated into suppression of sensory gain and feeding behavior.

      Third, we reposition Hugin/NMU as an attractive and conceptually distinct target because of its peptidergic nature, evolutionary conservation, and previously reported but mechanistically unresolved links to feeding regulation. This framing motivates our central question: whether Hugin/NMU neurons function as a direct internal energy sensor that actively implements a satiety-specific inhibitory control over taste perception.

      In parallel, the Discussion has been reorganized to avoid an unfocused review of feeding circuits across species and instead to interpret our findings within a clear conceptual framework. We now emphasize that the Hugin–AstA (and NMU) pathway represents a satiety-driven “brake” that complements, rather than duplicates, established hunger-driven “accelerator” circuits. This restructuring clarifies both the novelty of our findings and their relevance within the existing literature.

      Reviewer #2 (Recommendations for the authors):

      When discussing the results of Figure 1, such as lines 203-204, "These results demonstrate that sugar intake inhibits sweet sensation, probably via increasing circulating sugar levels" it may be worth discussing the known impact of sweet sensation experience on future sweet taste responses. With the data shown here, it is difficult to conclusively separate blood glucose levels from the sweet sensation that happens during the re-feeding. The "normal diet minus sucrose" does not blunt the starved PER effect, but that could potentially be impacted by either/both sugar intake or sweet taste.

      We thank the reviewer for this thoughtful and important point. We agree that sweet taste experience itself can influence subsequent sweet sensitivity, and that separating the contribution of sensory experience from nutrient-derived internal energy is non-trivial.

      In the revised manuscript, we have clarified the experimental timing by explicitly stating that PER was assessed 15 minutes after refeeding. At this time point, hemolymph glucose levels have returned to baseline (Figure supplementary 5), supporting the physiological relevance of glucose-dependent activation of Hugin neurons under our experimental conditions.

      We also acknowledge that sweet taste exposure can induce sensory adaptation and modulate future taste responses. To directly address this potential confound, we performed additional control experiments during revision (Figure supplementary 4B) in which starved flies were refed with sorbitol (caloric but not sweet) or arabinose (sweet but non-nutritive). We found that both manipulations partially reduced PER, but neither recapitulated the full suppressive effect of sucrose refeeding.

      These results indicate that sweet taste experience and metabolic energy contribute in parallel to the regulation of sweet sensitivity. Importantly, the incomplete effects of sorbitol or arabinose alone suggest that neither sensory adaptation nor caloric value is sufficient by itself to fully account for the observed PER suppression.

      Accordingly, we have revised the Discussion to clarify that the Hugin–AstA pathway likely operates within a broader, multi-layered regulatory framework, integrating internal metabolic state with sensory experience, rather than acting as a sole determinant of post-feeding sweet sensitivity. This clarification avoids over-attribution of the behavioral effect to circulating glucose alone while preserving the central conclusion that internal energy state is a key modulator of sweet perception.

      Blocking cellular sugar intake or metabolism could be impacting the ability of neurons to function, distinct from any specific intracellular regulatory mechanism that glucose or its derivatives might be involved with. That may be a caveat worth mentioning in the results or discussion.

      We thank the reviewer for raising this important caveat. We agree that blocking cellular sugar uptake or metabolism could, in principle, impair neuronal function in a nonspecific manner, independent of any dedicated intracellular glucose-sensing mechanism.

      In the revised manuscript, we now explicitly acknowledge this possibility and clarify the scope of our interpretation. Several features of our data argue against a generalized loss of neuronal function as the primary explanation. First, the behavioral and physiological effects observed upon manipulation of glucose transport or K<sub>ATP</sub> channel activity are rapid and reversible, consistent with state-dependent modulation rather than chronic metabolic failure. Second, these manipulations selectively affect sweet sensitivity and feeding-related behaviors, without causing gross deficits in proboscis extension or neuronal responsiveness.

      Accordingly, we have revised the Results to emphasize that while intracellular glucose metabolism is required for normal neuronal activity, our findings specifically support a role for glucose-dependent modulation of neuronal excitability in satiety signaling, rather than a nonspecific energetic impairment.

      Minor suggestions:

      (1) Figure 2G: "Pryuvate" -> "Pyruvate."

      We have corrected “Pryuvate” to “Pyruvate”

      (2) "Fly" methods section: it says that flies were kept on 2% agar for 12 hours for starvation, but in the Figure 1A description, it says 24 hours.

      We have corrected the description in Figure 1A.

      Reviewer #3 (Recommendations for the authors):

      (1) SEZ Hugin+ and AstA+ neurons were activated by glucose (Figures 1G, 1I), yet hemolymph also contains trehalose and fructose. For instance, DH44 neurons respond broadly to all hemolymph sugars (Dus et al., 2015), while Gr43a neurons specifically detect fructose (Miyamoto et al., 2012). The present study does not clarify whether Hugin+ or AstA+ neurons are similarly sugar-specific or more broadly tuned. A systematic analysis is needed to determine whether these circuits are selective for glucose.

      We thank the reviewer for raising this important question regarding sugar specificity. We agree that hemolymph contains multiple sugars, including trehalose and fructose, and that distinct neural systems have been shown to differ in their tuning breadth. To address this issue, we performed additional experiments during revision in which starved wild-type flies were refed with different sugars—including sucrose, fructose, trehalose, and sorbitol—followed by PER measurements. We found that sucrose refeeding produced the strongest suppression of PER, whereas fructose, trehalose, and sorbitol induced weaker effects (Figuresupplementary 4A).

      We interpret these results as suggesting a preferential sensitivity of the Hugin/AstA pathway to glucose availability rather than a broad responsiveness to all circulating sugars. One plausible explanation is that fructose, trehalose, and sorbitol require peripheral metabolic conversion before contributing to intracellular glucose levels in neurons, whereas sucrose feeding rapidly restores hemolymph glucose within the 15-minute time window used in our experiments (Figure supplementary 5).

      Importantly, we now clarify in the revised Results and Discussion that our data support a functional preference for glucose under physiological conditions, rather than excluding the possibility that other sugars may influence this circuit indirectly or on longer timescales.

      (2) The authors state that SEZ, but not VNC, Hugin+ neurons regulate AstA activity (lines 318-319). However, comparison of Figure Supplement 8B with the severing sample in Figure Supplement 11B shows a more pronounced reduction of sweet sensation under hug>TrpA1 activation. Although the absolute response in Figure 3F (in vivo) is higher than that in the cut-off preparation (Figure S11), comparison of Figure S11C with Figure 3F indicates that hug+ neurons drive an AstA+ calcium transient more than fourfold greater in the presence of VNC neurons. Thus, the contribution of Hugin+ VNC neurons cannot be dismissed, and the conclusion should be revised accordingly.

      We thank the reviewer for this careful and quantitative comparison. We agree that our original wording overstated the exclusivity of SEZ Hugin<sup>+</sup> neurons in regulating AstA activity.

      Upon closer examination of the data, we now acknowledge that VNC Hugin<sup>+</sup> neurons likely contribute to AstA activation. As the reviewer points out, the AstA<sup>+</sup> calcium response evoked by Hugin activation is substantially larger when VNC neurons are intact (Figure supplementary11C) compared with the cut preparation (Figure 3F), indicating that descending inputs from the VNC can potentiate AstA neuronal activity.

      Accordingly, we have revised the manuscript to state that SEZ Hugin<sup>+</sup> neurons play a predominant role in driving AstA responses relevant to sweet sensation, while VNC Hugin<sup>+</sup> neurons provide additional modulatory input that enhances the overall magnitude of Hugin signaling. These revisions have been made in the Results to more accurately reflect the contributions of distinct Hugin subpopulations.

      (3) In Figure 4D, you show AstA-R1 co-localized with Gr5a-expressing cells. However, Gr5a-expressing cells also co-express Gr64f in labellum (Fuji et al., 2015, Current Biology). Are the authors sure that the sweet sensation they described is Gr5a-specific? Testing Gr64f is essential. Moreover, Fuji et al. demonstrated that Gr5a loss-of-function mutation impairs not only sucrose but also maltose, fructose, and trehalose sensation. This raises a question of whether the Hug+ and AstA+ neurons identified in the current study contribute to sensing sugars beyond sucrose. Additional experiments are required to clarify this point.

      Please see our responses to the Reviewing Editor Comments (4).

      (4) While nutritive sugar sensors such as Dh44 neurons have been directly implicated in sugar preference (Dus et al., 2015, Neuron), this study examines the hug+,AstA+, Gr5a neuronal circuit only in the context of PER responses. Why is sugar preference not assessed here, especially given that in mice, the comparison was made using preference tests?

      We thank the reviewer for this insightful question. We agree that sugar preference assays provide important information about feeding decisions and reward-based behavior. In the present study, however, we deliberately focused on the proboscis extension reflex (PER) because it offers a direct, quantitative, and temporally precise readout of sweet sensory sensitivity at the sensory–motor level.

      PER allows us to isolate changes in taste perception itself, largely independent of post-ingestive reinforcement, learning, or motivational state, all of which strongly influence preference-based assays. This distinction is particularly important given our central goal of identifying a circuit that directly links internal energy sensing to modulation of peripheral sweet-sensing neurons.

      By contrast, sugar preference reflects an integrated behavioral outcome combining sensory input, internal state, and post-ingestive reward signals, including those mediated by DH44 neurons and other nutritive sensing pathways. We therefore chose PER as the most mechanistically specific assay to dissect the Hugin–AstA–Gr5a pathway. We now explicitly acknowledge in the revised Discussion that determining how this satiety-linked sensory modulation interacts with reward and post-ingestive circuits to shape long-term sugar preference will be an important direction for future studies.

      Several other concerns:

      (5) The intraperitoneal injection of NMU is interpreted as reflecting a brain-specific NMU effect, but such systemic delivery cannot exclude peripheral actions. In Figure 5D, the use of whole-body KO mice is insufficient; targeted manipulations (e.g., NMU-Cre-driven inactivation) are required to establish circuit-specific behavioral roles.

      Please see our responses to the Reviewing Editor Comments (Low priority)

      (6) In Figure 5F and 5M, neural activity is measured under different conditions: gastric glucose infusion in 5F versus glucose licking in 5M. To establish that NMU VMH neurons and Calb2 rNST neurons belong to the same circuit, this discrepancy in stimulation timing must be resolved to support the conclusions.

      We thank the reviewer for pointing out this important issue regarding stimulation paradigms in Figures 5F and 5M. We agree that the difference between gastric glucose infusion and glucose licking requires explicit clarification.

      In the revised manuscript, we now clearly state that these two paradigms were intentionally designed to probe complementary levels of the same NMU–Calb2 circuit. In Figure 5F, gastric glucose infusion was used to isolate the internal energy-sensing property of VMH NMU<sup>+</sup> neurons, independent of oral sensory input, motor behavior, or reward expectation. This experiment establishes that NMU<sup>+</sup> neurons are directly activated by elevated circulating glucose.

      By contrast, Figures 5M examined how activation of this NMU pathway modulates downstream Calb2<sup>+</sup> rNST neurons under physiologically relevant feeding conditions, in which sweet taste signals are naturally evoked by licking. This design allows us to test the functional consequence of NMU signaling on sweet-responsive rNST neurons during normal sensory processing.

      Although the route and timing of glucose delivery differ, both paradigms converge on a unified circuit model: internal glucose elevation activates VMH NMU<sup>+</sup> neurons, and NMU signaling suppresses sweet-driven activity in Calb2<sup>+</sup> rNST neurons. We have revised the Results and figure legends to explicitly describe this layered experimental logic and to clarify that Figures 5F and 5M together establish distinct but connected nodes of the same circuit.

      (7) Figure 5I-J. The glucose concentration used appears excessively high. In mammals, blood glucose in the sated state is ~7-8 mM. It is unclear whether the observed responses represent physiological effects or artifacts of supraphysiological stimulation. Additional experiments with lower glucose concentrations would strengthen the study.

      We thank the reviewer for raising this important concern regarding the glucose concentration used in Figure 5I–J. We agree that the concentration applied in ex vivo slice experiments exceeds the typical physiological range of circulating glucose.

      This higher concentration was intentionally chosen to ensure reliable neuronal activation in acute brain slices, where glucose diffusion, uptake, and metabolic access are substantially slower than in vivo. Similar approaches have been widely used in studies of glucose-sensitive hypothalamic neurons to overcome these technical limitations (e.g., Kim et al., 2025., Neuron).

      Importantly, the physiological relevance of our findings is supported by in vivo fiber photometry experiments, which demonstrate that VMH NMU⁺ neurons are robustly activated following normal sugar ingestion under physiological conditions. Thus, while supraphysiological glucose was used to establish glucose responsiveness ex vivo, our in vivo data confirm that NMU⁺ neurons respond to glucose elevations within the normal physiological range.

      (8) Figure 5K. The VMH images are inconsistently oriented compared with Figure 5E, lacking a 3v landmark. The NMU detection method (IHC or FISH) is not specified in the legend. The GFP-Calb2 signal is heavily saturated, making it difficult to distinguish true signals from artifacts. These issues undermine interpretability.

      We thank the reviewer for pointing out these issues. In the revised manuscript, VMH images in Figure 5K have been reoriented to match Figure 5E, and the third ventricle (3v) is now indicated as an anatomical landmark. The figure legend has been revised to clarify that NMU<sup>+</sup> neurons are identified by GFP expression from a Cre-dependent AAV2/1-DIO-GFP injected into NMU-Cre mice, rather than by NMU immunohistochemistry or FISH. In addition, GFP–Calb2 images have been reprocessed to clearly distinguish true signals from background and imaging artifacts.

      (9) Figure 5L-M. Details of the NMU injection method are absent (route, dose, delivery parameters). The number of animals (n) is also not reported. Furthermore, AUC reduction alone is not sufficient evidence of robust inhibition. To convincingly demonstrate causality, NMU-IRES-Cre mice should be combined with DREADD or optogenetic approaches to directly inhibit NMU neurons and test whether rNST Calb2 activity is reduced.

      We thank the reviewer for these helpful comments. We have revised the manuscript to include all missing methodological details. These details are now clearly described in the Methods section and figure legend.

      We fully acknowledge that cell-type–specific manipulations, such as DREADD or optogenetic inhibition of NMU neurons, would provide more definitive causal evidence. However, our main goal in the mouse experiments was to demonstrate that NMU<sup>+</sup> neurons can directly sense glucose and modulate sweet sensitivity, thereby supporting the evolutionary conservation of the Hugin mechanism identified in Drosophila. Detailed dissection of the downstream circuit architecture and behavioral consequences in mammals is indeed an important direction for future research, but it lies beyond the current study’s primary focus on cross-species conservation.

      (10) In Drosophila, hugin neurons respond selectively to nutritive glucose (Fig. 2H), but whether NMU neurons share this property is unknown. Notably, Calb2 neurons in the rNST respond to the artificial sweetener AceK (Hao Jin et al., 2021, Cell), leaving open whether the NMU-rNST circuit is calorie-dependent or calorie-independent.

      We have added a statement in the Discussion acknowledging this limitation and emphasizing that future work will be needed to test whether the NMU–Calb2 circuit is selectively engaged by metabolically active sugars or also by sweet taste signals independent of caloric value.

      Minor comments

      (11) All bar graphs should include individual data points.

      We have added individual data points to all bar graphs.

      (12) In Figures 3E, 4C, and 4D, it appears that a combination of GAL4 and LexA was used, but the information about the fly lines is missing.

      We have now included the complete list of fly lines used for these experiments, including their genotypes and sources.

      (13) The source for PK2-R1 KO, AstA-R1 KO fly lines and NMU-IRES-Cre, Calb2-IRES-Cre mice is missing.

      We have added the complete source information for all genetic lines mentioned.

      (14) Figure 5B-D, This is a sucrose preference test, so why is the y-axis labeled as glucose? Is this an error, or were the values converted to glucose equivalents?

      We thank the reviewer for catching this mistake. The assay shown in Figure 5B–D measured sucrose preference, not glucose preference. The inconsistency resulted from a typographical error in the Methods description. In the revised manuscript, we have corrected this error to clearly state that sucrose was used in the preference test,

      (15) Supplementary Figure 15. The NMU images are of poor quality and should be improved.

      The punctate appearance of NMU signals in Supplementary Figure 15 is not due to poor image quality but rather reflects the physiological distribution of the NMU neuropeptide. As NMU is stored in secretory vesicles within neuronal terminals and somata, its immunostaining typically appears as discrete puncta rather than diffuse cytoplasmic labeling.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.<br /> Readers would also benefit from noting that the mice were male and discussion of the exclusion of females.

      In the revised manuscript, we have included full statistical reporting for all key experiments in the resource data. Regarding animal sex, we confirm that all mouse experiments were conducted using male mice. This choice was made to minimize variability caused by hormonal cycles in females, which can influence feeding behavior and glucose metabolism. We have now explicitly stated this information in the Methods section and included a brief discussion noting that sex-specific differences in NMU–Calb2 circuitry and feeding regulation represent an important question for future investigation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC- 2025-03341

      Corresponding author(s): Thomas, Leonard

      1. General Statements [optional]* *

      The reviews are positive, constructive, and balanced. The reviewers highlighted the novelty, scope, technical rigor, and strength of evidence of the study. The reviewers also noted the technological advance in modeling of multi-domain proteins that we report. In summary, there are two major advances reported in this study, both of which have important implications, both within the field of lipid signaling and in the broader field of in silico structural modeling.

      Lipid signaling. We have elucidated the mechanism by which a protein kinase is allosterically activated by a specific lipid second messenger (PIP3) at atomic resolution. To the best of our knowledge, this has not been achieved for any kinase to date. Our findings have implications for (a) the spatial and temporal confinement of Tec signaling in cells by PIP3, (b) the rationalization of disease-causing mutations in XLA, and (c) the development of novel therapeutics that could be of clinical value in the treatment of B-cell malignancies. As such, we believe that this study will be of interest to a wide spectrum of basic scientists in the cell signaling community, as well as translational, and clinical scientists.

      __In silico structural modeling. __Whilst developed primarily to answer the biological question of PIP3-mediated activation of the Tec kinases (see above), the improvement in AlphaFold modeling that we report has significant implications for all scientists concerned with structural modeling in silico, specifically with respect to the modeling of both multi-domain proteins and protein complexes. Given the widespread adoption of AlphaFold as a hypothesis generator, the audience for which these developments are relevant is actually very large, transcending all fields of the biological sciences.

      2. Description of the planned revisions

      • *The major suggestion made by reviewers #2 and #3 was the inclusion of a negative control in the lipid nanodisc assays (Figure 5) to confirm that it is PIP3 that specifically activates MbTEC. This is a constructive and valuable addition to our study, particularly in light of the fact that PI(4,5)P2 is present in cells at 2-4 orders of magnitude greater concentration than PIP3. This experiment will be combined with reviewer #2's suggestion to perform a PIP3 titration in the lipid nanodiscs.

      • *

      Reviewer #2____

      Although the nanodisc experiments clearly show PIP3-dependent activation, titrating the PIP3 content in nanodiscs (e.g., 0.1%, 0.5%, 1%, 3%, 5% of PIP3) to determine whether MbTEC activation shows a graded response to lipid abundance would strengthen the conclusions. This would support the suggested allosteric mechanism and aid in differentiating between digital and analogue activation behaviour.

      • We thank the reviewer for the nice suggestion, which we will combine with the negative control suggested by the reviewer in the next comment.

      A good negative control for Figure 5C, would be a nanodisc containing another phosphoinositide. Given prior evidence that TEC-family PH domains display selectivity for PIP3, it would nevertheless be informative to test nanodiscs containing other phosphoinositides (e.g., PI(4,5)P2, PI(3,4)P2, and PI3P).

      • See response above. Reviewer #3

      Fig 5B/C: The nanodisc experiment lack some controls. In order to conclude that PIP3 is indeed critical for the observed enhance autophosphorylation of MbTEC, nanodiscs with e.g. PI3P, PI4P or PI5P should be used that are not expected to bind the MbTEC PH domain with high affinity. Likewise, or alternatively, a mutant PH domain with largely reduced PIP3 binding affinity would support trust in this central result of the paper. (estimated time investment: 1-2 months).

      • We appreciate the reviewer's suggestion, which was also proposed by reviewer #2. These experiments are planned as the number one priority (see response above).

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      Major comment: We think the proposal is overall coherent and reasonable and found it interesting. It is not, however, conclusive. Modeling played a key role in supporting this proposal, but the modelling itself was dependent on choices of parameters made by the authors. The reported AlphaFold 3 model depended on a customized MSA strategy: the authors report divergent placement of the PH domain with respect to the kinase domain in their AlphaFold 3 runs. In light of this observation they used a manually curated TEC family MSA with taxonomic reweighting. This helped the model convergence but it introduced arbitrarity in the modeling step.

      • We believe that it is necessary to clarify what exactly our custom AF pipeline does so as to avoid confusion, but also to render our work more impactful for future studies that employ AlphaFold. The divergent placement of the PH domain in AF's standard configuration arises from the inclusion of sequences in the MSA that do not belong to the Tec gene family (Supplementary Figure 4C) but are structurally related at the individual domain level and therefore identified by the profile Hidden Markov Models used by AF to generate deep MSAs. These sequences are unrelated to Tec phylogenetically and therefore have evolved under different selection pressures. What our custom pipeline does is exclude these sequences from the MSA, such that the evolutionary covariance signature exploited by AF to guide inter-residue distance restraints comes only from bona fide Tec sequences. In a second step, we sample the sequences to ensure taxonomic balance (sequence databases are heavily biased in terms of taxonomic representation). This step increases sequence diversity and, with it, the strength of the co-variance signal. Therefore, rather than introducing "arbitrariness" in the modeling, we actually reduce it.
      • Since the advance that we report in modeling multi-domain proteins with AlphaFold is applicable to all multi-domain proteins and protein complexes, we believe that it is valuable to convey the significance of the input MSA in as clear a fashion as possible. To illustrate why AlphaFold fails in its standard configuration, we have therefore performed an in silico analysis of the MSA automatically generated by AF when it is prompted to predict the structure of MbTec. We now include this analysis as a new Supplementary Figure (Supplementary Figure 4C). As can be seen, of the 50,000 sequences in the AF3-generated MSA, only 1,898 contain the complete set of regulatory PH, SH3, SH2 and kinase domains that characterize the Tec kinases. The remaining 48,102 sequences, while containing one or more of the individual domains found in Tec, are phylogenetically unrelated. This means that the co-variance signature that AF relies upon for accurate prediction of inter-domain interactions is contained in

        Minor comment: In two places the authors wrote "PIP3 is necessary and sufficient for both MbTEC activation and inactivation." This seems logically impossible. Revision is required.

      • We appreciate the reviewer's confusion here. This conclusion stemmed from the observation that PIP3 engagement is sufficient to promote full activation of MbTec on lipid nanodiscs in vitro (the synergistic effect of the hydrophobic stack mutation is lost in this context due to the presence of the polyproline motif in the PH-SH3 linker). However, in vivo, the SH2 domain is essential for BTK activation (by mediating its recruitment to activated receptors) and therefore it is incorrect to state that PIP3 is necessary and sufficient. It is necessary, but not sufficient - this is, again, analogous to an AND gate in an electronic circuit. We have revised the manuscript accordingly. Significance

      It attempted to clarify the role of the PH domain in TEC activation from a mechanistic perspective. If confirmed, it can potentially lead to novel approaches of drug discovery targeting TEC kinases.

      • Whilst we shied away from a discussion of therapeutic potential in our discussion to avoid unnecessary hype, the reviewer raises an important point, especially in light of the recent clinical success of BTK inhibitors in treating B-cell malignancies. As such, we have used the request made by Reviewer #2 to compare MbTec with Akt to highlight the potential for a new therapeutic modality in Tec kinase inhibition. The recent FDA approval of Capivasertib (November, 2023), an allosteric inhibitor of Akt, for the treatment of hormone-receptor (HR) positive, HER2-negative advanced or metastatic breast cancer provides a nice proof-of-concept. This discussion can be found in the response to Reviewer #2. Reviewer #3 also alluded to the "blockbuster drugs" used to treat B-cell malignancies, so we felt it appropriate to at least comment on the potential implications of our findings for the development of novel therapeutics. Reviewer #2

      • The inference for Figure 3 that PH domain exerts a strong autoinhibitory influence on kinase activity that cannot be overcome by disruption of the SH3-kinase interaction would benefit from further clarification. It is not immediately clear from the data that PH-domain-mediated inhibition should be seen as dominant rather than synergistic with SH3-kinase linker interactions. Although the autophosphorylation stoichiometry was measured for MbTEC32K L396A and MbTECFL L396A, a more thorough quantitative evaluation of the relative contributions of PH-domain removal versus SH3-linker disruption would be possible if this analysis were extended to MbTEC32K. Discussing whether these inhibitory components might instead work together/cooperatively to limit kinase activity or is it one dominant over the other , the authors are urged to thoroughly explain the reasoning behind the conclusion provided.

      • The reviewer raises an interesting question regarding the relative contributions of the various regulatory domains to autoinhibition. Ultimately, what our data show, both for MbTec autophosphorylation and substrate phosphorylation, is that disruption of the SH3-kinase interface results in kinase activation. The amplitude of the activation, however, is dependent on whether the PH domain is present or not. In the presence of the PH domain, the activation is very modest, whereas when it is removed, the amplitude is an order of magnitude greater. This reflects the fact that SH3 domain displacement without PH domain displacement does not permit acquisition of a conformation compatible with activation loop autophosphorylation. This implies that PIP3-dependent allosteric activation is a prerequisite for complete activation of Tec. PH domain deletion is also not permissive for complete activation, which requires SH3 domain displacement on top to drive autophosphorylation, an observation consistent with previous experimental data on Src. As the reviewer indicates, these are synergistic with one another - Tec is a coincidence detector of multiple signals, all of which are required for full activation. Our conclusion that the inhibitory influence of the PH domain cannot be overcome by displacement of the SH3 and SH2 domain, however, is important, since it strongly implies that PIP3 is necessary for Tec activation (i.e. that Tec is an AND gate and not an OR gate). We have revised our description of these results to better reflect the relative contributions of the various regulator domains:

      "These observations indicate that the PH and SH3 domains exert synergistic inhibitory effects on the kinase domain and that disengagement of both domains by ligand binding is required for complete activation of MbTec. This is the equivalent of an AND gate in an electronic circuit, as opposed to an OR gate."

      It would also be valuable if the authors in the discussion section can draw a contrast with PIP3-dependent activation mechanism of AKT. This would be helpful in highlighting the uniqueness of PIP3 dependent TEC activation.

      • We thank the reviewer for highlighting the value of comparing MbTec to Akt, for which the activation mechanism has been intensively studied, both in our lab and in many others. There are, indeed, some interesting similarities, which we now comment on in the following paragraph, which has been incorporated into our discussion section: "It is worth noting that the regulation of MbTec by PIP3 is analogous, although not entirely homologous, to the regulation of the Ser/Thr kinases Akt and PDK1. Like Tec, Akt and PDK1 contain PIP3-sensing PH domains which mediate autoinhibition of their respective kinase domains (PMIDs: 28157504 and 35387990). Although the autoinhibitory interfaces of Tec and Akt are structurally different, both interfaces impair activation loop phosphorylation and substrate binding, as well as PIP3 binding (PMIDs: 28157504, 29632185, 3438531). The specific autoinhibitory conformation of Akt has been exploited in the development of allosteric inhibitors, which exhibit significantly improved on-target specificity and have recently been approved for the treatment of cancer (PMID: 38592948). As such, our findings open a new potential therapeutic modality for the development of selective Tec kinase inhibitors. Given the recent success of ATP-competitive BTK inhibitors in treating B-cell malignancies (PMIDs: 26639149, 36511784), there is enormous therapeutic potential."

      *Minor Comments

      *

      Y579 and R581 comes without a significant context. Can the authors elaborate on these residues a bit.

      • We have tried to better introduce the rationale behind mutation of these residues by rephrasing this part of the results. The changes from the previous version are underlined:

      "Consistent with the loss of an energetically favorable interface, deletion of the PH domain resulted in a 6{degree sign}C reduction in thermal stability (Figure 2F, Supplementary Figure 6C). We next tested the specificity of the predicted PH-kinase interaction by mutating Y579 and R581, which are conserved residues in the interface (Figure 2G). Mutation of Y579 and R581 to alanine reduced thermal stability by 3{degree sign}C, while their mutation to asparate and glutamate respectively resulted in the same thermal stability as MbTEC32K lacking its PH domain (Figure 2F, Supplementary Figure 6D). These observations indicate that substitution of Y579 and R581 with alanine weakens the autoinhibitory conformation by reducing van der Waals contacts, but substitution with charged residues that introduce unfavorable interactions is sufficient to completely disrupt the interface. Consistently, MbTEC32K bound to the PH domain with an affinity of 4.0 mM, but binding of MbTEC32K Y579D R581E was barely detected (Figure 2H)." +

      Figure 2H - In the legend make wt as WT so that it matches the figure panel

      • Fixed.
      • Supplementary Figure 1J - Adjust the orientation of intensity on y axis

      • Fixed (now Supplementary Figure 2J).

      • Supplementary Figure 1H - In the figure it should be Y579 and R581

      • Fixed (now Supplementary Figure 2H).

      • Can the authors add that 5C is the representative autoradiographs for each construct from panel 5B. Make it clear.

      • Fixed.

      • Write the units for intensity on the y axis for the entire supplementary figure 1 • Supplementary Figure 2J and 2K - Make the 6 subscript in the legend for Gly 6.

      • Fixed (now Supplementary Figure 3J-K).

      • Can the authors include RRID wherever applicable in the methods section.

      • We have added in the RRID reference for the cell line employed in this study.

      • Include a space between i and was in the sentence " Each sequence iwas assigned a raw weight .

      • Fixed.

      • I think MSA is coming twice in the line above structure inference in the methods section. MSAs is repeating after balanced MSA. Kindly look into it.

      • Fixed.

        The work has been done using the TEC kinase from the choanoflagellate M.brevicollis, presumably for practical reasons of expression and purification. PIP3 signalling, to my knowledge, has not formally been demonstrated in choanoflagellates. This remains a concern in respect of the relevance of these findings to true metazoans which is the setting in which Class I PI3kinase generated PIP3 signalling is seen.

      • We appreciate the reviewer's concerns regarding the relevance of our findings to PIP3 signaling in metazoans. Whilst the production and sensing of PIP3 has not formally been demonstrated in a choanoflagellate, we believe that sufficient circumstantial evidence exists that should allay these concerns. Specifically:

      • Evolutionary evidence exists for the presence of the PI3K machinery in the last eukaryotic ancestor (LECA) (PMID: 26482564), approximately 1.2-1.8 billion years ago. Choanoflagellates, are, by comparison quite young (600-650 My).
      • Choanoflagellates have an extensive tyrosine kinase signaling network, including RTKs (PMID: 18621719)
      • PI3K/PIP3/PTEN signaling has been robustly demonstrated in organisms that predate choanoflagellates by hundreds of millions of years, including Amoebozoa e.g. D. discoideum and E. histolytica (PMIDs: 9778249, 11352940, 12062103, 12062104, 12802064).
      • Monosiga brevicollis encodes:
      • class I PI3K p110 and p85 homologs (Manning et al, PNAS 2008)
      • a PTEN homolog
      • note that class I PI3Kd is responsible for the plasma membrane PIP3 signal in metazoan immune cells, meaning that a homolog of this enzyme is present in choanoflagellates
      • Choanoflagellates encode homologs of metazoan proteins that are known to respond specifically to PIP3, including:
      • MbTec
      • PDK1 (NCBI Reference Sequence: XP_004995400.1)
      • Akt (NCBI Reference Sequence: XP_001743446.1)
      • A recent kinase inhibitor screen in the choanoflagellate S. rosetta revealed the activity of known PI3K inhibitors (regulation of growth, phosphotyrosine signaling etc) (PMID: 40226336)
      • Conclusion: choanoflagellates inherited an ancient lipid-signaling toolkit.
      • Nevertheless, we believe that the reviewer makes an important point that is important to clarify for the uninitiated reader. We therefore propose the following additional paragraph to our discussion section that deals explicitly with these concerns:

      "Although PIP3 signaling has not been explicitly demonstrated in a choanoflagellate, the machineries for its production predate choanoflagellates by at least 500 My (PMID: 26482564). PI3K-mediated production, PH domain-mediated sensing, and PTEN-mediated degradation of PIP3 have all been robustly demonstrated to control chemotaxis in the slime mold Dictyostelium discoideum (PMIDs: 9778249, 11352940, 11389841, 12062103, 12062104, 12802064). While the Tec kinases emerged more recently (PMID: 30183386), PI3K, PTEN, PDK1, and Akt are all found in choanoflagellates, suggesting that choanoflagellates inherited an ancient lipid signaling toolkit and that the Tec kinases were a novel evolutionary addition to the toolbox."

      Reviewer #3____

      Points to be addressed:

      Fig 1B: For the sequence alignment, a few more residues before/after the four critical selected residues should be shown. This allows the reader to evaluate how conserved these residues really are. (estimated time investment: ~1 day max.)

      • Figure 1B is not actually a conventional sequence alignment, since it shows four residues that are structurally related, but not found in a contiguous sequence. However, we have added a new Supplementary Figure panel (Supplementary Figure 1A) to show the sequence motifs for each residue.

        Fig. 2 I/J/K: It is more customary to show HDX-MS results mapped on a structural cartoon representation (and not surface representation). The current representation makes it impossible to see which functional areas of the different domains show increased/decreased HDX. In addition, mapping HDX changes on a linear sequence/sec structure plot (as also commonly used to represent HDX-MS data) should be shown in SI. (estimated time investment: Reviewer #1

      This is important because the whole thesis of this manuscript rest on the model's suggestion that the kinase domain sequesters the PIP3 binding site of the PH domain. The authors found that in cells full-length MbTEC transiently associated with the membrane but the isolated PH domain enjoyed more prolonged membrane association. The authors interpreted this difference in membrane association in terms of different sequestration of the PIP3-binding PH domain by the kinase domain, but the PH-kinase interaction is based on a model and it needs further validation.

      • Model validation, particularly in the era of AlphaFold, is critical, as the reviewer correctly notes. However, we dispute the reviewer's assertion that the PH-kinase interface derived from our model needs further validation. The following is a summary of all the orthogonal ways in which we validated the model. In terms of publishing standards, we believe we have exceeded what is widely accepted as robust evidence for a specific interface.
      • The pair-alignment error (PAE) plot (Figure 1H) exhibits prediction errors in the PH-kinase interface which are (a) extremely low and (b) comparable with those in the SH3-kinase, SH2-kinase, and SH3-SH2 interfaces, all of which are superimposable with experimental structures.
      • Comparison of the model with experimental small-angle X-ray scattering (SAXS) in solution revealed a near-perfect fit (Figure 2A). This demonstrates that the global conformation of the model is an accurate reflection of the conformation of MbTEC in solution.
      • Mutation of the interface on the kinase side leads to a loss of thermal stability equivalent to deletion of the PH domain (Figure 2F-G) and a failure to bind the PH domain in trans (Figure 2H).
      • Changes in HDX-MS of the interface-mutated protein (Figure 2I-L) are comparable to those in the PH domain-deleted construct (Supplementary Figure 6E-J).
      • Reciprocal mutation of the interface on the PH domain leads to a reduction in binding affinity for the SH3-SH2-kinase (32K) protein (Figure 4C).

      While autophosphorylation is dramatically enhanced by PIP3 containing nanodiscs, the interpretation can be complicated, as the manuscript itself acknowledged that membrane based experiments cannot readily deconvolute local concentration effects from allosteric effects, because concentrating proteins on a membrane can promote dimerization dependent autophosphorylation.

      • It is precisely for these reasons that we conducted the experiments detailed in Figure 3, since they do not convolute allosteric activation with local concentration on a membrane. These experiments underpin our conclusions that MbTec is specifically activated by dissociation of its PH domain from the kinase domain and not just by local concentration on a PIP3-containing membrane. Whilst the experiments in Figure 3 do not say anything about the specificity of the PH-kinase interface (which we addressed with other experiments), they unambiguously confirm the inhibitory effect of the PH domain that other studies have reported previously. Reviewer #2

      To elaborate on the point of sufficiency, can the authors utilise FRB-FKBP system to synthesize PIP3 ectopically and see if it leads to the recruitment of FL and PH in addition to PDGF stimulation. It will also be valuable if the authors can use PI3K inhibitors post PDGF stimulation to validate this point further. A colocalization with PIP3 biosensor post PDGF stimulation will also be a great control.

      • The reviewer's suggestion to use the FKBP-FRB system to synthesize PIP3 ectopically is elegant but, in our opinion, not necessary. The specific recruitment of Tec kinases to the plasma membrane in response to growth factor-stimulated production of PIP3 is well established (e.g. Varnai et al, JBC 1999). As such, a PIP3 biosensor is not necessary, since the Tec kinases are well established PIP3 sensors in cells.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The 5 human TEC kinases are cytoplasmic tyrosine kinases containing a prototypic (Src-like) SH2-SH2-kinase domain module with critical roles in various signaling pathways, in particular T- and B-cell signaling. One Tec kinase family member, Btk, is the target of major blockbuster drugs that revolutionized outcome of patients with a variety of B-cell malignancies. While major insights into the structure and regulation of Tec kinases by phosphorylation, as well as intra- and inter-molecular protein-protein interactions could be obtained over the past 3 decades, the role and precise mechanisms of TEC kinase regulation by binding to phosphoinositides at membranes, via its PH-TH unit, are much less clear.

      In this manuscript, the authors study the structure and regulation of an ancestral TEC kinase from the choanoflagellate M. brevicollis (MbTEC), which has a largely reduced set of tyrosine kinases (as compared to mammals), and therefore might offer a focused look on conserved essential kinase regulation that diversified and acquired cell-type specific fine-tuning during evolution. The manuscript first provides a nice workflow to obtain an accurate model of MbTEC using AlphaFold 3 modeling. SAXS supports the predicted compact conformation of MbTEC in solution. Removal of the PH domain resulted in lower thermal stability indicating an energetically favorable intramolecular interface, which was subsequently supported by HDX-MS measurements using full-length and 3 domain core (SH2-SH2-kinase domain: 32K) and possible PH-KD interface mutations of MbTEC. Kinase activity (autophosphorylation and substrate phosphorylation) assays support an autoinhibitory effect of the PH domain of MbTEC activation. Making use of a gain-of-function mutant, initially identified in human Btk PH domain, and in vitro experiments with nanodiscs containing PIP3 show strong activation of MbTEC autophosphorylation. Overall, the manuscript supports a model of PIP3-stimulated relief from PH domain-mediated autoinhibition of MbTEC resulting in full activation also involving disruption of the SH2-kinase linker interaction with the SH3 domain and displacement of the SH2 domain. The authors have used several different structural biology and biochemical assays, all of which allow for relatively precise (semi-)quantative answers to the underlying research questions. Hence, the claims and conclusions are very well supported and leave (very) little to be desired (see points below). This is a nice and clean structural biochemistry paper with generally well controlled experiments and appropriate choice of research methods. The manuscript text is well written, previous work appropriately mentioned/discussed and results are carefully interpreted and gauged towards a final model.

      Points to be addressed:

      Fig 1B: For the sequence alignment, a few more residues before/after the four critical selected residues should be shown. This allows the reader to evaluate how conserved these residues really are. (estimated time investment: ~1 day max.) Fig. 2 I/J/K: It is more customary to show HDX-MS results mapped on a structural cartoon representation (and not surface representation). The current representation makes it impossible to see which functional areas of the different domains show increased/decreased HDX. In addition, mapping HDX changes on a linear sequence/sec structure plot (as also commonly used to represent HDX-MS data) should be shown in SI. (estimated time investment: <1 week) Fig 5B/C: The nanodisc experiment lack some controls. In order to conclude that PIP3 is indeed critical for the observed enhance autophosphorylation of MbTEC, nanodiscs with e.g. PI3P, PI4P or PI5P should be used that are not expected to bind the MbTEC PH domain with high affinity. Likewise, or alternatively, a mutant PH domain with largely reduced PIP3 binding affinity would support trust in this central result of the paper. (estimated time investment: 1-2 months).

      Significance

      MbTEC kinase structure and regulation has not previously been studied. Hence, novelty is very good. Given the overall conservation of the structure and regulatory mechanisms of TEC kinases (and related SRC and ABL kinases), as well as the large number of prior studies on these kinases (by Prof. Leonard and several others in the field), many aspects of this study are not overly surprising and more confirmatory than groundbreaking. On the other hand, it is a well-controlled and experimentally "clean" study. It uses a state-of-the-art combination of modern structural/biochemical methods, provides some important technological advance to reliably model multi domain signaling proteins using AlphaFold 3. The detailed dissection of MbTEC regulation provides some novel aspects and offers a convincing model for TEC kinase activation with implications for the human/mouse protein. I am convinced that the manuscript will be of interest for the broader signaling community, as well as basic scientists in the fields of structural biology and membrane cell biology.

      I do have sufficient expertise to evaluate this manuscript, as I have used essentially all of the described methods in my lab before and have been working on proteins related to MbTEC for my entire scientific career.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Overall Summary

      The authors have tried to put forward a comprehensive structural and biochemical analysis of an ancestral TEC kinase from Monosiga brevicollis (MbTEC). The authors have used a wide array of state-of-the art approaches such as protein biochemistry, mutagenesis, thermal stability assays, SAXS, AlphaFold 3 modeling with curated MSAs, HDX-MS, kinase assays, lipid nanodiscs, mass spectrometry, and cell-based imaging to propose a detailed mechanism for autoinhibition of MbTEC and PIP3-dependent allosteric activation.

      Overall Comment

      The overall study presented is quite scientifically challenging and conceptually ambitious. The authors are to be commended for coming up with a manuscript with impressive technical rigor , experimental thoroughness and care with which the manuscript is written and presented.

      Major Comments

      1. The inference for Figure 3 that PH domain exerts a strong autoinhibitory influence on kinase activity that cannot be overcome by disruption of the SH3-kinase interaction would benefit from further clarification. It is not immediately clear from the data that PH-domain-mediated inhibition should be seen as dominant rather than synergistic with SH3-kinase linker interactions. Although the autophosphorylation stoichiometry was measured for MbTEC32K L396A and MbTECFL L396A, a more thorough quantitative evaluation of the relative contributions of PH-domain removal versus SH3-linker disruption would be possible if this analysis were extended to MbTEC32K. Discussing whether these inhibitory components might instead work together/cooperatively to limit kinase activity or is it one dominant over the other , the authors are urged to thoroughly explain the reasoning behind the conclusion provided.
      2. To elaborate on the point of sufficiency, can the authors utilise FRB-FKBP system to synthesize PIP3 ectopically and see if it leads to the recruitment of FL and PH in addition to PDGF stimulation. It will also be valuable if the authors can use PI3K inhibitors post PDGF stimulation to validate this point further. A colocalization with PIP3 biosensor post PDGF stimulation will also be a great control.
      3. Although the nanodisc experiments clearly show PIP3-dependent activation, titrating the PIP3 content in nanodiscs (e.g., 0.1%, 0.5%, 1%, 3%, 5% of PIP3) to determine whether MbTEC activation shows a graded response to lipid abundance would strengthen the conclusions. This would support the suggested allosteric mechanism and aid in differentiating between digital and analogue activation behaviour.
      4. A good negative control for Figure 5C, would be a nanodisc containing another phosphoinositide. Given prior evidence that TEC-family PH domains display selectivity for PIP3, it would nevertheless be informative to test nanodiscs containing other phosphoinositides (e.g., PI(4,5)P2, PI(3,4)P2, and PI3P)
      5. It would also be valuable if the authors in the discussion section can draw a contrast with PIP3-dependent activation mechanism of AKT . This would be helpful in highlighting the uniqueness of PIP3 dependent TEC activation.

      Minor Comments

      • Y579 and R581 comes without a significant context. Can the authors elaborate on these residues a bit.
      • Figure 2H - In the legend make wt as WT so that it matches the figure panel
      • Supplementary Figure 1J - Adjust the orientation of intensity on y axis
      • Supplementary Figure 1H - In the figure it should be Y579 and R581
      • Can the authors add that 5C is the representative autoradiographs for each construct from panel 5B. Make it clear.
      • Write the units for intensity on the y axis for the entire supplementary figure 1
      • Supplementary Figure 2J and 2K - Make the 6 subscript in the legend for Gly 6.
      • Can the authors include RRID wherever applicable in the methods section.
      • Include a space between i and was in the sentence " Each sequence iwas assigned a raw weight .
      • I think MSA is coming twice in the line above structure inference in the methods section. MSAs is repeating after balanced MSA. Kindly look into it.

      Significance

      General assessment

      This is a study on the TEC family of kinases that have an important role in the immune cells. Thus, alterations in their function is linked to both primary immunodeficiency as well as hematological malignancies. Thus understanding their mechanism of activation is of fundamental importance understanding protein kinase regulation as well as developing potential therapy for blood cell disorders.

      The work has been done using the TEC kinase from the choanoflagellate M.brevicollis, presumably for practical reasons of expression and purification. PIP3 signalling, to my knowledge, has not formally been demonstrated in choanoflagellates. This remains a concern in respect of the relevance of these findings to true metazoans which is the setting in which Class I PI3kinase generated PIP3 signalling is seen.

      Advance

      This study advances details of the molecular mechanism by which PIP3 interacts with and regulates TEC kinase function. This is a study in basic structural biology.

      Audience

      This study will be of interest to structural biologists and those with an interest in understanding phosphoinositide regulated protein function.

      My expertise

      Biochemistry and cell biology, phosphoinsoitde signalling

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary: This manuscript focuses on PIP3 dependent activation in TEC family kinases. It used the choanoflagellate Monosiga brevicollis TEC kinase MbTEC as the model system. The authors proposed that MbTEC adopts a compact solution conformation in its autoinhibited state, with the PH domain bound with the kinase domain. Based on a AlphaFold3 model, the PH-kinase interaction was proposed to sequester the PIP3 binding site of the PH domain, hindering the membrane interaction of the PH domain, and in turn disrupting TEC activation. The study organized data collected from many different experiments, including structure modeling, solution measurements, mutagenesis, activity assays, in vitro membrane reconstitution, and cellular localization assays.

      Major comment: We think the proposal is overall coherent and reasonable and found it interesting. It is not, however, conclusive. Modeling played a key role in supporting this proposal, but the modelling itself was dependent on choices of parameters made by the authors. The reported AlphaFold 3 model depended on a customized MSA strategy: the authors report divergent placement of the PH domain with respect to the kinase domain in their AlphaFold 3 runs. In light of this observation the they used a manually curated TEC family MSA with taxonomic reweighting. This helped the model convergence but it introduced arbitrarity in the modeling step. This is important because the whole thesis of this manuscript rest on the model's suggestion that the kinase domain sequesters the PIP3 binding site of the PH domain. The authors found that in cells full-length MbTEC transiently associated with the membrane but the isolated PH domain enjoyed more prolonged membrane association. The authors interpreted this difference in membrane association in terms of different sequestration of the PIP3-binding PH domain by the kinase domain, but the PH-kinase interaction is based on a model and it needs further validation. While autophosphorylation is dramatically enhanced by PIP3 containing nanodiscs, the interpretation can be complicated, as the manuscript itself acknowledged that membrane based experiments cannot readily deconvolute local concentration effects from allosteric effects, because concentrating proteins on a membrane can promote dimerization dependent autophosphorylation.

      Minor comment: In two places the authors wrote "PIP3 is necessary and sufficient for both MbTEC activation and inactivation." This seems logically impossible. Revision is required.

      Significance

      It attempted to clarify the role of the PH domain in TEC activation from a mechanistic perspective. If confirmed, it can potentially lead to novel approaches of drug discovery targeting TEC kinases.

    1. 매물 노출 제한

      매도자 회원이 이용 제한된 경우, 매물 리스트에서 숨김 처리되지만, 제안한 매수자의 제안 리스트에서는 계속 보여야함(근데 상세 보기 누르면 “숨겨진 매물입니다” 처리)


    2. 중개인의 사무소 변경은 사업자등록증 및 중개사무소 등록증 재검증 후 반영된다.

      중개인 사무소 변경은 최초 중개인 등록 프로세스 동일하게 진행(개업은 똑같이 증빙서류 제출 필요)

      *변경 필요없는 영향도 없는 정보는 제외

    3. 기존에 이용 정지 중인 계정과 동일한 식별자를 사용하여 가입할 경우 가입이 거부된다.

      '이용 정지 기간 동안' 조건 명시

    4. 7.5.1 매물 등록 제한

      저희는 최초 1회에 한해 등기부등본 열람 비용을 지원하고 있으며, 이후 열람 비용은 고객이 부담하고 있습니다.

      이에 따라 매물 등록 제한 정책은 제거 부탁드립니다.

    5. 발동 조건: 소유주 인증 불일치 3회 이상 (누적, 리셋 없음)

      저희는 최초 1회에 한해 등기부등본 열람 비용을 지원하고 있으며, 이후 열람 비용은 고객이 부담하고 있습니다.

      이에 따라 소유주 인증 불일치에 대한 별도의 정책은 제거 부탁드립니다.

    1. But Deanna D’Amore, the health department director for the city, said she so far has not seen much public complaints about the air quality.

      Since the air pollution from the wildfires was so prevalent back when it occurred, why do you think people weren't complaining? Do you think it is because they believed there is nothing to be done since the problem came from so far away in another country? Or because it did not really disturb their day to day life? Since often times in the summer months we are inside most of the time anyway due to how hot it can get.

    2. Stratford’s ozone received an F rating by the American Lung Association. But the town’s air wasn’t too bad by the time the smoke started to make its way into the state.

      Why does Stratford have such a poor ozone rating? Why is there such bad air pollution there? I personally don't remember being over there so I was wondering, maybe there are a lot of factories etc?

    3. We also sent out a Spanish version since Norwalk has a lot of Spanish speaking community members to share a message about how they could protect themselves,” Matthews said.

      I think this is great that they are being inclusive and keeping in mind that not everyone that lives here in the US speaks and understands English fluently. Its important that these alerts are in multiple languages so many individuals can understand them and stay safe in the face of air pollution.

    4. Air quality alerts remain in effect across Connecticut due to air pollution from Canadian wildfire smoke

      I was wondering if you know who pushed out those air quality alerts back when this occurred? I know it was specifically on the weather app. Also, can you speak on the importance on understanding how air pollution in one region can impact other regions nearby? Since air pollution can travel and impact other states etc.

    1. PFAS chemicals, often referred to as “forever chemicals” because they are so hard to break down, have been linked to cancer risk, reproductive complications and other health issues.

      I'm curious what sort of health issues have the PFAS been linked to? It lists a couple but I was wondering if you looked into that further. Do you know any other forever chemicals? I've never heard of that term before this article.

    2. Saint-Gobain did not admit to wrong-doing but in 2022, the company agreed in court as part of a consent decree to provide clean drinking water to approximately 1,000 homes whose water was contaminated.

      This is them trying to take responsibility, however this is not helping the source of the problem. They have not halted the production of their PFAS, and thus they are still producing more pollution.

    3. blamed for contaminating southern New Hampshire’s air and water with dangerous levels of PFAS chemicals, has finished demolishing the Merrimack manufacturing facility at the center of the controversy, the company announced Thursday.

      What are PFAS chemicals? Why are they so dangerous what do they do to the human body or to animals? It states that it is also an air pollutant so how would that work?

    1. eLife Assessment

      This study introduces a novel method for estimating spatial spectra from irregularly sampled intracranial EEG data, revealing cortical activity across all spatial frequencies, which supports the global and integrated nature of cortical dynamics. It showcases important technical innovations and rigorous analyses, including tests to rule out potential confounds. However, further direct evaluation of the model, for example by using simulated cortical activity with a known spatial spectrum (e.g., an iEEG volume-conductor model that describes the mapping from cortical current source density to iEEG signals, and that incorporates the reference electrodes and the particular montage used), would even further strengthen the solid evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase. The application to data illustrates the solidity of the method and their potential for discovery.

      Comments on revisions:

      The authors have provided responses to the previous recommendations. The paper does not seem to contain further significant improvements. I am thus not inclined to change my judgement.

    3. Reviewer #3 (Public review):

      Summary:

      The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).

      Strengths:

      A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.

      Weaknesses:

      Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.

      Comments on revisions:

      I would like to clarify two points:

      (1) In their response, the authors frame the role of simulations primarily as a means of assessing the effects of volume conduction. However, the purpose of evaluating a proposed estimation method through simulations extends beyond this specific issue. More generally, simulations are essential for establishing that the proposed method-particularly given the multiple non-trivial transformations applied to the observed data-produces accurate and reliable estimates under controlled conditions.

      (2) The authors seem to interpret my use of the term current source density as referring to the current source density (CSD) method, which is an approach to mitigating volume conduction by inverting Poisson's equation. This was not my intention: current source density refers to the physical quantity (i.e., the spatial density of current sources) underlying macroscopic brain activity, and is independent of any specific estimation or inversion technique.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase. The application to data illustrates the solidity of the method and their potential for discovery.

      Comments on revised submission:

      The authors have provided responses to the previous recommendations.

      We thank the reviewer for reviewing our manuscript again, and for their positive evaluation.

      Reviewer #3 (Public review):

      Summary:

      The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).

      Strenghs:

      A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.

      Weaknesses:

      Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.

      Comments on revised version:

      In my original review, I raised the following issue:

      "The proposed method of estimating wavelength from irregularly sampled three-dimensional iEEG data involves several steps (phase-extraction, singular value-decomposition, triangle definition, dimension reduction, etc.) and it is not at all clear that the concatenation of all these steps actually yields accurate estimates. Did the authors use more realistic simulations of cortical activity (i.e. on the convoluted cortical sheet) to verify that the method indeed yields accurate estimates of phase spectra?"

      And the authors' response was:

      "We now included detailed surrogate testing, in which varying combinations of sEEG phase data and veridical surrogate wavelengths are added together. See our reply from the public reviewer comments. We assess that real neurophysiological data (here, sEEG plus surrogate and MEG manipulated in various ways) is a more accurate way to address these issues. In our experience, large scale TWs appear spontaneously in realistic cortical simulations, and we now cite the relevant papers in the manuscript (line 53)."

      The point that I wanted to make is not that traveling waves appear in computational models of cortical activity, as the authors seem to think. My point was that the only direct way to evaluate the proposed method for estimating spatial spectra is to use simulated cortical activity with known spatial spectrum. In particular, with "realistic simulations" I refer to the iEEG volume-conductor model that describes the mapping from cortical current source density (CSD) to iEEG signals, and that incorporates the reference electrodes and the particular montage used.

      Although in the revised manuscript the authors have provided indirect evidence for the soundness of the proposed estimation method, the lack of a direct evaluation using realistic simulations with ground truth as described above makes that remain sceptical about the soundness of the method.

      We thank the reviewer for reviewing our manuscript again.

      We have reviewed the literature again on volume conduction effects in LFP measures of cortical activity. In all publications we reviewed, the conclusion is that the range of the effect is <1cm. We now mention the range of volume conduction in the Methods section dealing with the surrogate models (lines 1054-9) as well as added emphasis in the Discussion (lines 594-9).

      The highest spatial frequency we consider in the present research is 50c/m, which corresponds to a cortical distance of 2cm. This is well outside the range of volume conduction effects in LFPs. Mathematically speaking, blurring (e.g. Gaussian) acts as a low-pass filter, attenuating higher spatial frequency components. But only for components within the spatial range of the Gaussian blurring i.e. for LFPs, higher than 100c/m. There will therefore be negligible effects (mathematically speaking, zero effect) of volume conduction in the results reported by us. If the veracity of these studies on volume conduction with LFPs is accepted, then the reviewer’s requested simulation reduces to “estimating spatial spectra [using] simulated cortical activity with known spatial spectrum.” This is what we have done, in a direct and simple manner.

      If the ubiquity and importance of spatio-temporal dynamics in cortex is accepted, then it is insufficient to describe “the mapping from cortical current source density (CSD) to iEEG signals”, since this presumes a model of cortical activity that does not capture the correlations in space and time that we assume are critical to cortical function. We are aware the CSD approach has a long and successful history of unravelling brain mechanisms. However, an emphasis on traveling waves (and spatio-temporal dynamics in general) is in part a challenge to this approach (and the idea of localized sources in general). CSD approaches carry similar assumptions (but at a smaller scale, <1cm) as those elaborated in Zhigalov and Jensen (2023) for extra-cranial measures. In both cases, removal of volume conduction effects emphasizes standing wave activity (localized static, oscillatory sources) over traveling wave activity. In this manner, these methods tend to confirm their starting assumptions (as does our own approach, of course). What is required is external empirical validation to break any circular confirmation of initial theoretical choice of basis. All this is a way of saying that CSD approaches are not the unproblematic, direct methods that the reviewer asserts.

      We did understand the reviewer’s request to model the effects of volume conduction. Our own view of realistic cortical simulations differs from the reviewer’s, setting aside the final step in the forward modeling pipeline which would add the effects of volume conduction in the grey matter. By simulating real-time dynamics, it should be possible to untangle the effects of volume conduction from true spatio-temporal correlations. This is because the volume conduction effects are essentially instantaneous, compared to the relatively slow motion of traveling waves. So, the measurement of purely spatial phase vectors is prone to smearing artefact, but following the trajectory of a wave over one cycle can more accurately determine the range of true interactions. One could, for example, compare the usual CSD forward modelling with TWs in simulations, see which is the best predictor of future activity, and compare these to empirical measurements. Here, the CSD analysis would remove the volume conduction effects but also emphasize standing activity over motion, even where the motion was veridical in the simulation.

      Even so, these tests are only relevant in <1cm range.

      Another issue is ephaptic coupling, which we mention in the discussion. This means that some of the local volume conduction effects are not merely artefacts from the point of view of cortical function, but have a real causal effect. The strength of the word ‘some’ has yet to be completely resolved in the literature, and it would be technically challenging to include these effects in any simulation.

      Finally, simulation should be an adjunct to empirical studies, or used when empirical studies are not possible. We do not think, in this case, they are the ‘only direct’ way to evaluate our method. We, rather, rely on the converging evidence from empirical studies of volume conduction in LFPs which show this effect is outside the range of our reported results.

    1. eLife Assessment

      In this important work, the authors present a new transformer-based neural network designed to isolate and quantify higher-order epistasis in protein sequences. They provide solid evidence that higher-order epistasis can play key roles in protein function. This work will be of interest to the communities interested in modeling biological sequence data and understanding mutational effects.

    2. Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets they quantify the contribution of higher order epistasis, showing it varies quite extensively.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      Comments for the revision:

      I want to thank the authors for their efforts in revising the manuscript. Most of the concerns raised in the initial review have been adequately addressed.

      However, one important issue remains. I previously asked the authors to benchmark their method against stronger baselines. The authors declined, arguing that these alternatives are "not directly applicable to the types of analyses." I am not persuaded by this rationale. In my view, these baseline methods target essentially the same underlying problem, and at least some, if not all, should be included in a comparative evaluation (or the manuscript should provide a clearer, more technically grounded explanation of why such comparisons are not feasible or not meaningful).

    4. Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set, and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the phenotype of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4 or more amino acids. The novel network architecture achieves such a level of interpretability without noticeable performance penalty. The study of 10 different protein families shows that there is variation among protein families in the importance of these interactions, and that higher order interactions are particularly important to predict the phenotypes of distant proteins.

      Weaknesses:

      The Github repository provides a README file to run a standard pipeline, but a user will need to go through the code to actually know what that pipeline is doing.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets, they quantify the contribution of higher-order epistasis, showing that it varies quite extensively.

      Suggestions:

      (1) The approach taken is very interesting, but it is not particularly well placed in the context of recent related work. MAVE-NN, LANTERN, and MoCHI are all approaches that different labs have developed for inferring and fitting global epistasis functions to DMS datasets. MoCHI can also be used to infer multidimensional global epistasis (for example, folding and binding energies) and also pairwise (and higher order) specific interaction terms (see 10.1186/s13059-024-03444-y and 10.1371/journal.pcbi.1012132). It doesn't distract from the current work to better introduce these recent approaches in the introduction. A comparison of the different capabilities of the methods may also be helpful. It may also be interesting to compare the contributions to variance of 1st, 2nd, and higher-order interaction terms estimated by the Epistatic transformer and MoCHI.

      We thank the reviewer for the very thoughtful suggestion.

      Although these methods are conceptually related to our method, none of them can be realistically used to perform the type of inference we have done in the paper on most the datasets we used, as they all require explicitly enumerating the large number of interaction terms.

      We have included new text (Line 65-74) in the introduction to discuss the advantages and disadvantages of these models. We believe this has made our contribution better placed in the broader context of the field.

      (2) https://doi.org/10.1371/journal.pcbi.1004771 is another useful reference that relates different metrics of epistasis, including the useful distinction between biochemical/background-relative and backgroundaveraged epistasis.

      We have included this very relevant reference in the introduction. We also pointed out the limitation of these class of methods is that they typically require near combinatorically complete datasets and often have to rely on regularized regression to infer the parameters, making the inferred model parameters disconnected from their theoretical expectations. Line 49-56.

      (3) Which higher-order interactions are more important? Are there any mechanistic/structural insights?

      We thank the reviewer for pointing out this potential improvement. We have now included a detailed analysis of the GRB2-SH3 abundance landscape in the final section of the results. In particular, we estimated the contribution of individual amino acid sites to different orders (pairwise, 3-4th order, 4-8th order) of epistasis and discuss our finding in the context of the 3D structure of this domain. We also analyzed the sparsity of specific interactions among subsets of sites.

      Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      We thank the reviewer for the positive feedback.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions." There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled nonadditive interaction discovery in machine learning models."

      We thank the reviewer for this very helpful comment. These references are indeed conceptually quite similar to our framework. Although they are not directly applicable to the types of analyses we performed in this paper (partitioning contribution of epistasis into different interaction orders in terms of variance components), we have included a discussion of these methods in the introduction (Line 70-74). We believe this helps better situate our method within the broader conceptual context of interpreting machine learning models for epistatic interactions.

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      Again, we thank the reviewer for the thoughtful comment. We have addressed this comment together with a related comment by Reviewer1 by including a detailed analysis of the GRB2-SH3 landscape using a marginal epistasis framework, where we quantified the contribution of individual sites to different orders of epistasis as well as the sparsity of epistatic interactions. We also present these results in the context of the structure of this protein. Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      We agree that the under parameterization of the simple sigmoid function could be be potentially confounding. We did compare different choices of functional forms for modeling global epistasis. Overall, we found that there is no difference between a simple sigmoid function with four trainable parameters and the more complex version (sum of multiple sigmoid functions, used by popular methods such as MAVENN). Therefore, all results we presented in the paper were based on the model with a single scalable sigmoid function.

      We have added relevant text; line 153-158. We have also included side-by-side comparisons of the model performance for the GRB-abundance and the AAV2 dataset to corroborate this claim (Supplemental Figure 1).

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      We thank the reviewer for the thoughtful suggestion. We have rewritten the description of our metrics for measuring the importance of "pairwise", "3-4-way", and ">4-way" interactions; Line 232-239.

      We have also added a table to improve clarity, as suggested; Table 2.

      Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the function of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4, or more amino acids. The study of 10 different protein families shows that there is variation among protein families.

      Weaknesses:

      The manuscript is good overall, but could have gone a bit deeper by comparing the new architecture to standard transformers, and by investigating whether differences between protein families explain some of the differences in the importance of interactions between amino acids. Finally, the GitHub repository needs some more information to be usable.

      We thank the reviewer for the thoughtful comments. We have listed our response below in the “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some of the dataset labels are confusing. For example, GRB is actually the protein GRB2 and more specifically just one of the two SH3 domains from GRB2 (called GRB2-SH3 in Faure et al.).

      We thank the reviewer for catching this. Our original naming of the datasets followed the designation of library number in the Faure et al paper (which constructed 3 variant libraries and performed different assays on them). To avoid confusion (and also save space in the figure titles), we have now renamed the datasets using this mapping:

      Author response table 1.

      Reviewer #3 (Recommendations for the authors):

      (1) What is the cost of the interpretability of the model? It would be interesting to evaluate how a standard transformer, complete with its many non-linearities, performs on the simulated 13-position data, using the r2 metric. This is important as the last sentence of the discussion seems to suggest that the model proposed by the authors could be used in other contexts, where perhaps interpretability would be less important.

      We thank the reviewer for this suggestion. We have run a generic transformer model on the GRBabundance and AAV2 datasets. Overall, we found minimal difference between the generic model and our interpretable model, suggesting that fitting the interpretable transformer does not incur significant cost in performance.

      We have included a side-by-side comparison of the performance of the generic transformer and our three-layer model in Supplemental Figure 5 and a discussion of this finding in Line 256-259.

      (2) The 10 data sets analyzed by the authors differ in their behaviour. I was wondering whether the proteins have different characteristics, beyond the number and distribution of mutants in the data sets. For instance, do high-order interactions play a bigger role in longer proteins, in proteins with more secondary structures, in more hydrophobic proteins?

      We fully agree that this is a highly relevant question. Unfortunately, the paucity of datasets suitable for the type of analyses we performed in the paper limit our ability to draw general conclusions. Furthermore, the differences in genotype distribution among the 10 datasets may be the main driving factor in the behaviors of the models.

      We included our thoughts on this issue in the discussion (Line 477-481).

      We will definitely revisit this question if this type of high-order combinatorial DMS data becomes more available in the (hopefully) near future.

      (3) Although the code appears to be available in the repository, there is no information about the content of the different folders, about what the different scripts do, or about how to reproduce the article's results. More work should be done to clarify it all.

      Thank you for pointing this out. We have substantially improved our github repository and included many annotations for reproducibility.

      (4) Typos and minor comments:

      (a) p3 "a multi-peak fitness landscapes": landscape.

      (b) p3 "Here instead of directly fitting the the regression coefficients in Eq. 2": remove 'the'.

      (c) p3 "neural network architectures do not allow us to control the highest order of specific epistasis": a word is missing.

      (d) p6 "up to 1,926, 3,014, and 4,102 parameters, respectively-all smaller than the size of the training dataset": it's not very clear what size of the dataset means: number of example sequences?

      (e) p6 "This results confirm": This result confirms.

      (f) p6 "to the convergence of of the variance components of the model landscape to the ground truth.": remove 'of'.

      (g) p7 "to characterize the importance higher-order interactions": the importance of.

      (h) p7 "The improvement varies across datasets and range": and ranges.

      (i) p9 "over the pairwise model is due to the its ability": remove 'the'.

      (j) p13 "This results suggest that pairwise": result suggests.

      (k) p13 "although the role assessed by prediction for randomly sampled genotypes seems moderate": sampled. Also, I'm not sure I understand this part of the sentence: what results are used to support this claim? It's not 6b, which is only based on the mutational model.

      This is in Supplemental Figure 7.

      (l) p13 "potentially by modeling how the these local effects": remove the.

      (m) p13 "We first note that the the higher-order models": remove the.

      (n) p15 "M layers of MHA leads to a models that strictly": lead to a model.

      (o) Supp Figure 1: "Solid lines shows the inverse": show.

      (p) Supp p 10 "on 90% of randomly sample data": sampled.

      (q) Supp p11 "Next, assume that Eq. 5 is true for m > 0. We need to show that Eq. 5 is also true for m + 1.": shouldn't it be m>=0 ? It seems important to start the recursive argument.

      Good catch.

      (r) Supp p11 "Since the sum in line 9 run through subsets": runs.

      (s) Supp p11 "we can further simplify Eq. 11 it to": remove it.

      We have fixed all these problems. We very much appreciate the reviewer’s attention.