Hypothesis

8 Matching Annotations

Jun 2025
www.biorxiv.org www.biorxiv.org

AI-Guided Discovery and Optimization of Antimicrobial Peptides Through Species-Aware Language Model

1
1. isabel.nocedal 06 Jun 2025
  
  in Arcadia Science
  
  Very cool study! One suggestion: the pseudo perplexity values still seem pretty high even after fine-tuning, which may indicate some degree of underfitting. This could be due to the relatively small size of the 35M ESM2 model. Have you considered trying a larger model (150M or 650M)? If fine-tuning a larger ESM2 model is computationally prohibitive, it might still be informative to compare against the zero-shot performance of a larger model to assess whether fine-tuning is necessary, or whether a larger baseline alone achieves comparable predictive results.
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2025.05.20.654992v1
May 2025
www.biorxiv.org www.biorxiv.org

AI.zymes – A modular platform for evolutionary enzyme design

1
1. isabel.nocedal 05 May 2025
  
  in Arcadia Science
  
  Very cool study! It's great to see so many tools stitched together in such a purpose-built way. Have you thought about running your pipeline on other natural KSI homologs? It’d be interesting to see if, like in directed or natural evolution, certain starting points make it easier to explore sequence space or lead to better outcomes. This kind of pipeline seems like a great way to test that idea without requiring tons of experimental screening.
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2025.01.18.633707v1
www.biorxiv.org www.biorxiv.org

Antibody affinity engineering using antibody repertoire data and machine learning

1
1. isabel.nocedal 02 May 2025
  
  in Arcadia Science
  
  To visually examine the sequence-function relationship of the characterized antibody variants, both a network plot and a phylogenetic tree were generated
  
  Given that your results clearly show a strong relationship between sequence similarity and binding affinity (in both the phylogenetic tree and network analysis), did you consider alternative strategies for sequence encoding? In particular those that might capture some of this evolutionary signal? For example including additional features derived from the phylogenetic tree, network-based distances, or embeddings from protein language models (like ESM)?
  
  These kinds of features might be especially valuable in a small-sample setting like this one and could further boost the predictive power of your models. Very nice study! Great to see creative and effective ways to leverage the power of small experimental datasets for protein function prediction.
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2025.01.10.632313v1
Mar 2025
www.biorxiv.org www.biorxiv.org

Systematic comparison of Generative AI-Protein Models reveals fundamental differences between structural and sequence-based approaches

1
1. isabel.nocedal 28 Mar 2025
  
  in Arcadia Science
  
  To compare the catalytic activity, designed monomers were expressed in BHK21 cells together with a tetracycline inducible green fluorescent protein (GFP) and a synthetic protein consisting of tetracycline-controlled transactivator (tTA) tethered via a linker containing the TEV endogenous catalytic site (ENLYFQ’S) to a transmembrane domain protein. The transmembrane domain protein fused tTA is localised to the plasma membrane, and thus the GFP signal is low in the absence of an active TEV protease, but an active protease cleaves tTA enabling its translocation to the nucleus and induction of GFP expression (Figure 4A).
  
  Very cool paper! Really great to see a (rare) comparison between all these different methods. I’m very interested in the experimental readout, do you have any thoughts on how the in cell GFP assay might be influenced by factors like expression level, stability, or translational efficiency? Just curious if you think those could affect the comparisons at all.
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2025.03.23.644844v1
Feb 2025
www.biorxiv.org www.biorxiv.org

Biophysics-based protein language models for protein engineering

1
1. isabel.nocedal 07 Feb 2025
  
  in Arcadia Science
  
  This is a really cool approach to bringing biophysical information into protein mutation prediction. It does seem worth exploring whether including the evolutionary information gleaned from LLMs like ESM2 improves the performance of METL. Combining these two approaches seems like it has real potential to leverage different types of information. Have you thought about ways to use embeddings from models like ESM2 in the METL pretraining to try to improve generalizability? It would be cool to see if these embeddings actually improve performance, especially with small experimental training sets. Great work!
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2024.03.15.585128v2
www.biorxiv.org www.biorxiv.org

Discovery of Expression-Governing Residues in Proteins

1
1. isabel.nocedal 07 Feb 2025
  
  in Arcadia Science
  
  Very interesting work! I’m curious about the effects of using training data from multiple expression systems (bacteria, fungi, mammalian cells), particularly since expression requirements can vary slightly between organisms. Have you explored whether expression system-specific models perform better when predicting expression within a given system? Or, is the training data biased toward one particular expression system, potentially leading to worse predictions for others? Or has the model really learned general features of expression across these organisms? Great work!
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2025.01.06.631498v1
Dec 2024
www.biorxiv.org www.biorxiv.org

Fluctuations and the limit of predictability in protein evolution

2
1. isabel.nocedal 19 Dec 2024
  
  in Public
  
  The green, blue and red lines highlights the same specific choices of ancestor used in Fig. 1.
  
  It would be helpful to define these colors in one or more of the figure captions. Currently they are only defined in the main text (not Fig 1) despite this pointing to Figure 1.
2. isabel.nocedal 19 Dec 2024
  
  in Public
  
  Our analysis shows that the amount of diversity at a given evolutionary time depends strongly on the ancestor, due to highly non-trivial epistatic dynamical correlations. More epistatically constrained ancestors give rise to less diversity, thus allowing for reconstruction over longer evolutionary times (Fig. 8a). Yet, at comparable amount of diversity, more epistatic ancestors are more difficult to reconstruct (Fig. 8b), at least using the FastML algorithm that neglects correlations between sites
  
  This is a really intriguing finding. It would be interesting to look at the posterior probabilities of the FastML reconstructed ancestors for each level of epistasis. I am wondering if the posterior probabilities reflect the uncertainty you are observing here, or whether ASR algorithms are blind to them (because they are blind to epistasis). In other words, do the more epistatic ancestors produce ML ASR sequences with lower posterior probabilities or are the probabilities misleadingly high? If the latter, this work could have implications for the validity of ASR on sequences with high levels of epistasis since the posterior probabilities are generally used as a measure of confidence in the reconstructions.
Visit annotations in context

Annotators

isabel.nocedal

URL

biorxiv.org/content/10.1101/2024.12.04.626874v1

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL