3,118 Matching Annotations
  1. Aug 2025
  2. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  3. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  4. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Developed a full-stack web application using with Flask serving a REST API with React as the frontend

      Remove 'using with' for clarity. Add impact metrics, such as user adoption rates or performance improvements.

  5. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Created LLM extension tools to help translate complex internal wikipedia pages to hyperlinked code snippets to help internal customers use the project at low-level logic, increasing efficiency by 300%.

      Provide context on what 'efficiency' means here. What specific tasks were made easier or faster?

    2. Automated robust CI/CD by building custom pipelines to unit, load, and integration test the code with 100% code coverage, enhancing safety in deployment into production waves.

      Specify how this automation improved deployment frequency or reduced errors in production.

    3. Designed a highly efficient system flow in integration and canary testing, decreasing latency by 70% and cost per API invocation by 2000%.

      Clarify the baseline metrics for latency and cost to provide context for the improvements made.

    4. Streamlined session management across internal teams by consolidating different types of sessions into a single master session, simplifying workflows between upstream and downstream callers.

      Quantify the efficiency gained or time saved through this consolidation to better illustrate the impact.

    5. Developed portable Model Context Protocol (MCP) servers for the team, extending knowledge for AI tools such as Amazon Q and Kiro IDE to study internal data and automate self-service tools, saving $240,000 every year.

      Explain how the $240,000 savings was calculated and what specific processes were improved to achieve this.

    6. Engineered solutions to operational problems involving cache validations and cyclic calls to raise the business availability to 99.998% and lower latency in customer federation by 60% in the busiest availability zones.

      Break down the specific operational problems solved and how they directly impacted user experience or system reliability.

    7. Addressed security challenges in serving device authentication and authorization flows to extremely reduce the chance of phishing attacks for customers.

      Quantify the reduction in phishing incidents or security breaches to highlight the effectiveness of your solutions.

    8. Led the creation of user background sessions to enable AI services such as AWS SageMaker run long-running tasks without user interactivity, creating a new paradigm in model training on AWS.

      Clarify how this paradigm shift benefited AWS users or reduced costs. Provide measurable outcomes.

    9. Took ownership of maintaining OIDC and SAML services for customer federation and integration with native and third-party applications across AWS.

      Specify the impact of maintaining these services. How did it improve customer experience or system performance?

  6. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  7. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  8. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. driving fast and iterative improvements and integrating AI-powered feedback directly within Discord.

      Provide specific outcomes from the feedback integration, such as user adoption rates or satisfaction scores.

  9. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Developed a full-stack web application to help students locate nearby study spots, track study sessions, and create study groups.

      Add metrics on user engagement or feedback to showcase the app's impact on student productivity.

    2. Participated in daily scrum meetings with a team of 5 developers to discuss new ideas and strategies in line with the agile workflow.

      Highlight any specific contributions or outcomes from these meetings to show leadership or initiative.

    3. eliminating the need for 100+ complex spreadsheets and enabling 30+ executives to securely access operational, financial, and customer data.

      Quantify the time saved for executives or any decision-making improvements resulting from this change.

  10. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Developing an AI agent that monitors stablecoin flows in real time and infers intent behind large movements such as panic selling or emerging depeg risks, triggering proactive alerts and automated treasury actions for DAOs and crypto funds.

      Consider shortening for clarity; e.g., 'Developing an AI agent to monitor stablecoin flows and trigger alerts for large movements.'

    2. Implemented in-line PDF annotations through integration with Hypothes.is and AWS S3, automated change detection for resume updates, and version tracking with DynamoDB.

      Break into two sentences for clarity; consider rephrasing 'automated change detection' to 'automated detection of changes'.

    3. Built a Discord bot to streamline collaborative resume reviews, driving fast and iterative resume improvements for a community of 2000+ students.

      Specify 'driving fast and iterative improvements' with measurable outcomes, e.g., 'resulting in 30% faster review times'.

    4. Participated in daily scrum meetings with a team of 5 developers to discuss new ideas and strategies in line with the agile workflow.

      Use active voice: 'Collaborated in daily scrum meetings with a team of 5 developers...' for a stronger impact.

    5. Redesigned layout and fixed critical responsiveness issues on 10+ web pages using Bootstrap, restoring broken mobile views and ensuring consistent, functional interfaces across devices.

      Quantify 'critical responsiveness issues' with specifics to enhance impact; e.g., 'fixed 5 critical responsiveness issues'.

    6. Developed dashboards for an internal portal with .NET Core MVC, eliminating the need for 100+ complex spreadsheets and enabling 30+ executives to securely access operational, financial, and customer data.

      Consider rephrasing 'eliminating the need for 100+ complex spreadsheets' to 'replacing 100+ complex spreadsheets' for stronger impact.

    7. Led backend unit testing automation for the shift bidding platform using xUnit, SQLite, and Azure Pipelines, contributing 40+ tests, identifying logic errors, and increasing overall coverage by 15%.

      Break into two sentences for clarity; consider rephrasing 'increasing overall coverage by 15%' to 'increasing test coverage by 15%'.

  11. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Built an NLP-powered Telegram Bot that parses natural language commands to allow expense-splitting directly in your group chat

      Specify user engagement metrics or feedback to illustrate the bot's effectiveness and popularity.

    2. Built a Discord bot to streamline collaborative resume reviews, driving fast and iterative resume improvements for a community of 2000+ students.

      Add specific metrics on how many resumes were improved or how quickly to demonstrate impact.

    3. Participated in daily scrum meetings with a team of 5 developers to discuss new ideas and strategies in line with the agile workflow.

      Focus on your contributions or outcomes from these meetings to highlight your role more effectively.

    4. eliminating the need for 100+ complex spreadsheets and enabling 30+ executives to securely access operational, financial, and customer data.

      Clarify how this change improved decision-making or efficiency for the executives.

  12. Jul 2025
  13. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Built an NLP-powered Telegram Bot that parses natural language commands to allow expense-splitting directly in your group chat with fast, secure, on-chain expense records.

      Include user adoption rates or feedback to illustrate the bot's effectiveness and popularity.

    2. Developing an AI agent that monitors stablecoin flows in real time and infers intent behind large movements such as panic selling or emerging depeg risks, triggering proactive alerts and automated treasury actions for DAOs and crypto funds.

      Clarify the potential financial impact or risk reduction achieved through this AI agent's alerts.

    3. Built a Discord bot to streamline collaborative resume reviews, driving fast and iterative resume improvements for a community of 2000+ students.

      Add metrics on how many resumes were improved or user satisfaction ratings to demonstrate impact.

    4. Participated in daily scrum meetings with a team of 5 developers to discuss new ideas and strategies in line with the agile workflow.

      Highlight a specific contribution or idea that led to a significant improvement in team performance.

    5. Redesigned layout and fixed critical responsiveness issues on 10+ web pages using Bootstrap, restoring broken mobile views and ensuring consistent, functional interfaces across devices.

      Specify the user engagement metrics or feedback received post-redesign to showcase impact.

    6. Developed dashboards for an internal portal with .NET Core MVC, eliminating the need for 100+ complex spreadsheets and enabling 30+ executives to securely access operational, financial, and customer data.

      Quantify the decision-making improvements or time saved for executives due to the dashboards.

    7. Built a React/.NET impersonation tool enabling admins to emulate employee sessions for support and troubleshooting, cutting developer testing setup time by 86% by eliminating the need for test accounts.

      Consider rephrasing to emphasize how this tool improved support response times or user experience.

    8. Led backend unit testing automation for the shift bidding platform using xUnit, SQLite, and Azure Pipelines, contributing 40+ tests, identifying logic errors, and increasing overall coverage by 15%.

      Add a specific example of a critical bug found to highlight the importance of your contributions.

    9. Developed an end-to-end shift bid publishing feature using Azure Functions (C#), SQL, and Azure Logic Apps, automating shift imports into the HR system for 700+ employees and saving 50+ hr/month of manual entry.

      Clarify the impact by stating how this improved efficiency or employee satisfaction beyond just time saved.

  14. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  15. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Developed a full-stack web application to help students locate nearby study spots, track study sessions, and create study groups.

      Mention any user adoption rates or feedback to highlight the application's success and relevance.

    2. Participated in daily scrum meetings with a team of 5 developers to discuss new ideas and strategies in line with the agile workflow.

      Highlight any specific contributions or outcomes from these meetings to demonstrate leadership.

    3. eliminating the need for 100+ complex spreadsheets and enabling 30+ executives to securely access operational, financial, and customer data.

      Quantify the time saved for executives to highlight the efficiency gained through your work.

  16. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  17. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
  18. resu-bot-bucket.s3.ca-central-1.amazonaws.com resu-bot-bucket.s3.ca-central-1.amazonaws.com
    1. Oh yeah. If you're generating text that could burn anywhere from 0.17 watt hours to 2 watt hours, equal to running this grill for about four seconds. Generating an image add 1.7 watt hours. All that, less than 10 seconds on the grill. But short videos can use far more power. In tests of various open source models, videos took anywhere between 20 watt hours and 110 watt hours. At 110 watt hours, one steamed electric grill steak, about equal to one video generation. I wouldn't eat it, but my dog would. At 220 watt hours, it was looking much more edible. So two video generations equals one pretty good looking steak.

      Comparisons of text versus image versus video generation

    1. Navigating Failures in Pods With Devices

      Summary: Navigating Failures in Pods With Devices

      This article examines the unique challenges Kubernetes faces in managing specialized hardware (e.g., GPUs, accelerators) within AI/ML workloads, and explores current pain points, DIY solutions, and the future roadmap for more robust device failure handling.

      Why AI/ML Workloads Are Different

      • Heavy Dependence on Specialized Hardware: AI/ML jobs require devices like GPUs, with hardware failures causing significant disruptions.
      • Complex Scheduling: Tasks may consume entire machines or need coordinated scheduling across nodes due to device interconnects.
      • High Running Costs: Specialized nodes are expensive; idle time is wasteful.
      • Non-Traditional Failure Models: Standard Kubernetes assumptions (like treating nodes as fungible, or pods as easily replaceable) don’t apply well; failures can trigger large-scale restarts or job aborts.

      Major Failure Modes in Kubernetes With Devices

      1. Kubernetes Infrastructure Failures

        • Multiple actors (device plugin, kubelet, scheduler) must work together; failures can occur at any stage.
        • Issues include pods failing admission, poor scheduling, or pods unable to run despite healthy hardware.
        • Best Practices: Early restarts, close monitoring, canary deployments, use of verified device plugins and drivers.
      2. Device Failures

        • Kubernetes has limited built-in ability to handle device failures—unhealthy devices simply reduce the allocatable count.
        • Lacks correlation between device failure and pod/container failure.
        • DIY Solutions:
          • Node Health Controllers: Restart nodes if device capacity drops, but these can be slow and blunt.
          • Pod Failure Policies: Pods exit with special codes for device errors, but support is limited and mostly for batch jobs.
          • Custom Pod Watchers: Scripts or controllers watch pod/device status, forcibly delete pods attached to failed devices, prompting rescheduling.
      3. Container Code Failures

        • Kubernetes can only restart containers or reschedule pods, with limited expressiveness about what counts as failure.
        • For large AI/ML jobs: Orchestration wrappers restart failed main executables, aiming to avoid expensive full job restart cycles.
      4. Device Degradation

        • Not all device issues result in outright failure; degraded performance now occurs more frequently (e.g., one slow GPU dragging down training).
        • Detection and remediation are largely DIY; Kubernetes does not yet natively express "degraded" status.

      Current Workarounds & Limitations

      • Most device-failure strategies are manual or require high privileges.
      • Workarounds are often fragile, costly, or disruptive.
      • Kubernetes lacks standardized abstractions for device health and device importance at pod or cluster level.

      Roadmap: What’s Next for Kubernetes

      SIG Node and Kubernetes community are focusing on:

      • Improving core reliability: Ensuring kubelet, device manager, and plugins handle failures gracefully.
      • Making Failure Signals Visible: Initiatives like KEP 4680 aim to expose device health at pod status level.
      • Integration With Pod Failure Policies: Plans to recognize device failures as first-class events for triggering recovery.
      • Pod Descheduling: Enabling pods to be rescheduled off failed/unhealthy devices, even with restartPolicy: Always.
      • Better Handling for Large-Scale AI/ML Workloads: More granular recovery, fast in-place restarts, state snapshotting.
      • Device Degradation Signals: Early discussions on tracking performance degradation, but no mature standard yet.

      Key Takeaway

      Kubernetes remains the platform of choice for AI/ML, but device- and hardware-aware failure handling is still evolving. Most robust solutions are still "DIY," but community and upstream investment is underway to standardize and automate recovery and resilience for workloads depending on specialized hardware.

    1. Automating oral argument

      A Harvard Law graduate who argued before the Supreme Court fed his case briefs into Claude 4 Opus and had it answer the same questions the Justices posed to him. The AI delivered what he called an "outstanding oral argument" with coherent answers and clever responses he hadn't considered, leading him to conclude that AI lawyers could soon outperform even top human advocates at oral argument.

    1. Inter-node communication stalls: high batching is crucial to profitably serve millions of users, and in the context of SOTA reasoning models, many nodes are often required. Inference workloads then resemble more training.

      Oh, so to get the highest throughout, the inference servers also batch operations making it look a bit like training too

  19. Jun 2025
    1. https://web.archive.org/web/20250630134724/https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/

      'agent washing' Agentic AI underperforms, getting at most 30% tasks right (Gemini 2.5-Pro) but mostly under 10%.

      Article contains examples of what I think we should agentic hallucination, where not finding a solution, it takes steps to alter reality to fit the solution (e.g. renaming a user so it was the right user to send a message to, as the right user could not be found). Meredith Witthaker is mentioned, but from her statement I saw a key element is missing: most of that access will be in clear text, as models can't do encryption. Meaning not just the model, but the fact of access existing is a major vulnerability.

    1. 'It turns out the company had no AI and instead was just a group of Indian developers pretending to write code as AI,

      'AI' softw dev company, is actually a pool of 700 India based coders. Exposed because they couldn't meet payroll....

    1. 1000x Increase in AI Demand
      • NVIDIA’s latest earnings highlight a dramatic surge in AI demand, driven by a shift from simple one-shot inference to more complex, compute-intensive reasoning tasks.
      • Reasoning models require hundreds to thousands of times more computational resources and tokens per task, significantly increasing GPU usage, especially for AI coding agents and advanced applications.
      • Major hyperscalers like Microsoft, Google, and OpenAI are experiencing exponential growth in token generation, with Microsoft alone processing over 100 trillion tokens in Q1—a fivefold year-over-year increase.
      • Hyperscalers are deploying nearly 1,000 NVL72 racks (72,000 Blackwell GPUs) per week, and NVIDIA-powered “AI factories” have doubled year-over-year to nearly 100, with the average GPU count per factory also doubling.
      • To meet this unprecedented demand, more than $300 billion in capital expenditure is being invested this year in data centers (rebranded by NVIDIA as “AI factories”), signaling a new industrial revolution in AI infrastructure.
  20. May 2025
    1. advanced AI (but not “superintelligent” AI,

      wish there was a clear cut definition or at least advertisement of authors' stakes, stances, and definitions of the following terms

      technological determinism; agent; intelligence; control; progress; alignment

    1. Anthropic researchers said this was not an isolated incident, and that Claude had a tendency to “bulk-email media and law-enforcement figures to surface evidence of wrongdoing.”

      for - question - progress trap - open source AI models - for blackmail and ransom - Could a bad actor take an open source codebase and twist it to do harm like find out about an rogue AI creator's adversary, enemy or victim and blackmail them? - progress trap - open source AI - criminals - exploit to identify and blackmail victiims

    1. anthropic's new AI model shows ability to deceive and blackmail

      for - progress trap - AI - blackmail - AI - autonomy - progress trap - AI - Anthropic - Claude Opus 4 - to - article - Anthropic Claude 4 blackmail and news leak - progress trap - AI - article - Anthropic Claude 4 - blackmail - rare behavior - Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

    1. An IBM survey of 2,000 CEOs revealed that just 25% of AI projects deliver on their promised return on investment. The main driver of adoption, it seems, is corporate FOMO, with nearly two-thirds of CEOs agreeing that “the risk of falling behind drives them to invest in some technologies before they have a clear understanding of the value they bring to the organization,” according to the study.

      New stat from IBM? This is similar to the RAND figure from before?

    1. for - natural language acquisition - Automatic Language Growth - ALG - youtube - interview - David Long - Automatic Language Growth - from - youtube - The Language School that Teaches Adults like Babies - https://hyp.is/Ls_IbCpbEfCEqEfjBlJ8hw/www.youtube.com/watch?v=984rkMbvp-w

      summary - The key takeaway is that even as adults, we have retained our innate language learning skill which requires simply treating a new language as a new, novel experience that we can apprehend naturally simply by experiencing it like the way we did when we were exposed to our first, native language - We didn't know what a "language" was theoretically when we were infants, but we simply fell into the experience and played with the experiences and our primary caretakers guided us - We didn't know grammar and rules of language, we just learned innately

    1. Once multiple accurate students enter the same tag for a new image, the system wouldbe confident that the tag is correct. In this manner, image tagging and vocabulary learning can becombined into a single activity.

      is this not how CAPTCHA is evaluated too?

    1. "a man who understands Chinese is not a man who has a firm grasp of the statistical probabilities for the occurrence of the various words in the Chinese language" (p. 108).

      cf./viz. classical statistical machine learning and language models

    2. Gottfried Leibniz made a similar argument in 1714 against mechanism (the idea that everything that makes up a human being could, in principle, be explained in mechanical terms. In other words, that a person, including their mind, is merely a very complex machine).

      anatomy of a landscape / atrocity exhibition

  21. Apr 2025
    1. for - report - America's Superintelligence Project - definition - ASI - Artificial Super Intelligence

      summary - What is the cost of mistrust between nation states? - The mistrust between the US and China is reaching an all-tie high and it has disastrous consequences for an AI arms race - It is driving each country to move fast and break things, which will become an existential threat to all humanity - Deep Humanity, with an important dimension of progress traps can help us navigate ASI

    2. To this day, if you know the right people, the Silicon Valley gossip mill is a surprisingly reliable source of information if you want to anticipate the next beat in frontier AI – and that’s a problem. You can’t have your most critical national security technology built in labs that are almost certainly CCP-penetrated

      for - high security risk - US AI labs

    1. https://web.archive.org/web/20250423134653/https://www.rijksoverheid.nl/documenten/publicaties/2025/04/22/het-overheidsbrede-standpunt-voor-de-inzet-van-generatieve-ai Rijksstandpunt genAI, mede gebaseerd op IEC advies IPO. Niettemin wordt het hier lijkt me behoorlijk vrij gegeven, en de formulering klinkt heel los. Gaat problemen opleveren, want een bmw die met genAI speelt bij het opstellen van een stuk het voor zichzelf als 'experiment' labelt of 'innovatie' heeft het voor zich daarmee gerationaliseerd. Never mind dat experimenten gecontroleerde omstandigheden vergen, en innovatie een gedeelde intentie moet hebben in de org. Dit voelt heel zacht aan, staan de juiste dingen in desondanks

      [[When Will the GenAI Bubble Burst]]

    1. misled investors by exploiting the promise and allure of AI technology to build a false narrative about innovation that never existed. This type of deception not only victimizes innocent investors

      The crime was misleading investors, not anyone else, which is very telling. The hype around "AI" - and actually hiring remote workers to do the job - and misleading customers/users doesn't matter.

    2. In truth, nate relied heavily on teams of human workers—primarily located overseas—to manually process transactions in secret, mimicking what users believed was being done by automation

      Yet another example of "AI" being neither artificial nor intelligent.

    1. This change means many data centers built in central, western, and rural China—where electricity and land are cheaper—are losing their allure to AI companies. In Zhengzhou, a city in Li’s home province of Henan, a newly built data center is even distributing free computing vouchers to local tech firms but still struggles to attract clients.

      Interesting cautionary tale about building out DCs in the styx, where energy is cheap but latency is high

    1. Instead of drafting a first version with pen and paper (my preferred writing tools), I spent an entire hour walking outside, talking to ChatGPT in Advanced Voice Mode. We went through all the fuzzy ideas in my head, clarified and organized them, explored some additional talking points, and eventually pulled everything together into a first outline.

      Need to try this out.

    1. Review coordinated by Life Science Editors Foundation Reviewed by: Dr. Angela Andersen, Life Science Editors Foundation & Life Science Editors Potential Conflicts of Interest: None

      PUNCHLINE Evo 2 is a biological foundation model trained on 9.3 trillion DNA bases across all domains of life. It predicts the impact of genetic variation—including in noncoding and clinically relevant regions—without requiring task-specific fine-tuning. Evo 2 also generates genome-scale sequences and epigenomic architectures guided by predictive models. By interpreting its internal representations using sparse autoencoders, the model is shown to rediscover known biological features and uncover previously unannotated patterns with potential functional significance. These capabilities establish Evo 2 as a generalist model for prediction, annotation, and biological design.

      BACKGROUND A foundation model is a large-scale machine learning model trained on massive and diverse datasets to learn general features that can be reused across tasks. Evo 2 is such a model for genomics: it learns from raw DNA sequence alone—across bacteria, archaea, eukaryotes, and bacteriophage—without explicit labels or training on specific tasks. This enables it to generalize to a wide range of biological questions, including predicting the effects of genetic variants, identifying regulatory elements, and generating genome-scale sequences or chromatin features.

      Evo 2 comes in two versions: one with 7 billion parameters (7B) and a larger version with 40 billion parameters (40B). These numbers reflect the number of trainable weights in the model and influence its capacity to learn complex patterns. Both models were trained using a context window of up to 1 million tokens—where each token is a nucleotide—allowing the model to capture long-range dependencies across entire genomic regions.

      Evo 2 learns via self-supervised learning, a method in which the model learns to predict masked or missing DNA bases in a sequence. Through this simple but powerful objective, the model discovers statistical patterns that correspond to biological structure and function, without being told what those patterns mean.

      QUESTION ADDRESSED Can a large-scale foundation model trained solely on genomic sequences generalize across biological tasks—such as predicting mutational effects, modeling gene regulation, and generating realistic genomic sequences—without supervision or task-specific tuning?

      SUMMARY The authors introduce Evo 2, a foundational model for genomics that generalizes across DNA, RNA, and protein tasks. Without seeing any biological labels, Evo 2 learns the sequence rules governing coding and noncoding function, predicts variant effects—including in BRCA1/2 and splicing regions—and generates full-length genomes and epigenome profiles. It also enables epigenome-aware sequence design by coupling sequence generation with predictive models of chromatin accessibility.

      To probe what the model has learned internally, the authors use sparse autoencoders (SAEs)—a technique that compresses the model’s internal activations into a smaller set of interpretable features. These features often correspond to known biological elements, but importantly, some appear to capture novel, uncharacterized patterns that do not match existing annotations but are consistently associated with genomic regions of potential functional importance. This combination of rediscovery and novelty makes Evo 2 a uniquely powerful tool for exploring both the known and the unknown genome.

      KEY RESULTS Evo 2 trains on vast genomic data using a novel architecture to handle long DNA sequences Figures 1 + S1 Goal: Build a model capable of representing entire genomic regions (up to 1 million bases) from any organism. Outcome: Evo 2 was trained on 9.3 trillion bases using a hybrid convolution-attention architecture (StripedHyena 2). The model achieves long-context recall and strong perplexity scaling with increasing sequence length and model size.

      Evo 2 predicts the impact of mutations across DNA, RNA, and protein fitness Figures 2A–J + S2–S3 Goal: Assess whether Evo 2 can identify deleterious mutations without supervision across diverse organisms and molecules. Outcome: Evo 2 assigns lower likelihoods to biologically disruptive mutations—e.g., frameshifts, premature stops, and non-synonymous changes—mirroring evolutionary constraint. Predictions correlate with deep mutational scanning data and gene essentiality assays. Evo 2 embeddings also support highly accurate exon-intron classifiers.

      Clarification: “Generalist performance across DNA, RNA, and protein tasks” means that Evo 2 can simultaneously make accurate predictions about the functional impact of genetic variants on transcription, splicing, RNA stability, translation, and protein structure—without being specifically trained on any of these tasks.

      Evo 2 achieves state-of-the-art performance in clinical variant effect prediction Figures 3A–I + S4 Goal: Evaluate Evo 2's ability to predict pathogenicity of human genetic variants. Outcome: Evo 2 matches or outperforms specialized models on coding, noncoding, splicing, and indel variants. It accurately classifies BRCA1/2 mutations and generalizes to novel variant types. When paired with supervised classifiers using its embeddings, it achieves state-of-the-art accuracy on BRCA1 variant interpretation.

      Evo 2 representations reveal both known and novel biological features through sparse autoencoders Figures 4A–G + S5–S7 Goal: Understand what Evo 2 has learned internally. Outcome: Sparse autoencoders decompose Evo 2’s internal representations into distinct features—many of which align with well-known biological elements such as exon-intron boundaries, transcription factor motifs, protein secondary structure, CRISPR spacers, and mobile elements. Importantly, a subset of features do not correspond to any known annotations, yet appear repeatedly in biologically plausible contexts. These unannotated features may represent novel regulatory sequences, structural motifs, or other functional elements that remain to be characterized experimentally.

      Note: Sparse autoencoders are neural networks that reduce high-dimensional representations to a smaller set of features, enforcing sparsity so that each feature ideally captures a distinct biological signal. This approach enables mechanistic insight into what the model “knows” about sequence biology.

      Evo 2 generates genome-scale sequences with realistic structure and content Figures 5A–L + S8 Goal: Assess whether Evo 2 can generate complete genome sequences that resemble natural ones. Outcome: Evo 2 successfully generates mitochondrial genomes, minimal bacterial genomes, and yeast chromosomes. These sequences contain realistic coding regions, tRNAs, promoters, and structural features. Predicted proteins fold correctly and recapitulate functional domains.

      Evo 2 enables design of DNA with targeted epigenomic features Figures 6A–G + S9 Goal: Use Evo 2 to generate DNA sequences with user-defined chromatin accessibility profiles. Outcome: By coupling Evo 2 with predictors like Enformer and Borzoi, the authors guide generation to match desired ATAC-seq profiles. Using a beam search strategy—where the model explores and ranks multiple possible output sequences—it generates synthetic DNA that encodes specific chromatin accessibility patterns, such as writing “EVO2” in open/closed chromatin space.

      STRENGTHS First large-scale, open-source biological foundation model trained across all domains of life

      Performs well across variant effect prediction, genome annotation, and generative biology

      Demonstrates mechanistic interpretability via sparse autoencoders

      Learns both known and novel biological features directly from raw sequence

      Unsupervised learning generalizes to clinical and functional genomics

      Robust evaluation across species, sequence types, and biological scales

      FUTURE WORK & EXPERIMENTAL DIRECTIONS Expand training to include viruses that infect eukaryotic hosts: Evo 2 currently excludes these sequences, in part to reduce potential for misuse and due to their unusual nucleotide structure and compact coding. As a result, Evo 2 performs poorly on eukaryotic viral sequence prediction and generation. Including these genomes could expand its applications in virology and public health.

      Empirical validation of novel features: Use CRISPR perturbation, reporter assays, or conservation analysis to test Evo 2-derived features that don’t align with existing annotations.

      Targeted mutagenesis: Use Evo 2 to identify high-impact or compensatory variants in disease-linked loci, and validate using genome editing or saturation mutagenesis.

      Epigenomic editing: Validate Evo 2-designed sequences for chromatin accessibility using ATAC-seq or synthetic enhancer assays.

      Clinical applications: Fine-tune Evo 2 embeddings to improve rare disease variant interpretation or personalized genome annotation.

      Synthetic evolution: Explore whether Evo 2 can generate synthetic genomes with tunable ecological or evolutionary features, enabling testing of evolutionary hypotheses.

      AUTHORSHIP NOTE This review was drafted with support from ChatGPT (OpenAI) to help organize and articulate key ideas clearly and concisely. I provided detailed prompts, interpretations, and edits to ensure the review reflects an expert understanding of the biology and the paper’s contributions. The final version has been reviewed and approved by me.

      FINAL TAKEAWAY Evo 2 is a breakthrough in foundation models for biology—offering accurate prediction, functional annotation, and genome-scale generation, all learned from raw DNA sequence. By capturing universal patterns across life, and identifying both well-characterized and unknown sequence features, Evo 2 opens powerful new directions in evolutionary biology, genomics, and biological design. Its open release invites widespread use and innovation across the life sciences.

  22. Mar 2025
    1. I asked our friend Dr. Oblivion, Why is it better to refer to AI hallucinations and AI mirages? His response.

      I'm assuming this is some kind of ✨sparkling intelligence✨ and given that Dr. Oblivion seems to miss the point of the paper and our discussion here, I found it more illustrative than helpful ;)

    1. 推理模型 (deepseek-reasoner) deepseek-reasoner 是 DeepSeek 推出的推理模型。在输出最终回答之前,模型会先输出一段思维链内容,以提升最终答案的准确性。我们的 API 向用户开放 deepseek-reasoner 思维链的内容,以供用户查看、展示、蒸馏使用。 在使用 deepseek-reasoner 时,请先升级 OpenAI SDK 以支持新参数。 pip3 install -U openai API 参数​ 输入参数: max_tokens:最终回答的最大长度(不含思维链输出),默认为 4K,最大为 8K。请注意,思维链的输出最多可以达到 32K tokens,控思维链的长度的参数(reasoning_effort)将会在近期上线。 输出字段: reasoning_content:思维链内容,与 content 同级,访问方法见访问样例 content:最终回答内容 上下文长度:API 最大支持 64K 上下文,输出的 reasoning_content 长度不计入 64K 上下文长度中 支持的功能:对话补全,对话前缀续写 (Beta) 不支持的功能:Function Call、Json Output、FIM 补全 (Beta) 不支持的参数:temperature、top_p、presence_penalty、frequency_penalty、logprobs、top_logprobs。请注意,为了兼容已有软件,设置 temperature、top_p、presence_penalty、frequency_penalty 参数不会报错,但也不会生效。设置 logprobs、top_logprobs 会报错。 上下文拼接​ 在每一轮对话过程中,模型会输出思维链内容(reasoning_content)和最终回答(content)。在下一轮对话中,之前轮输出的思维链内容不会被拼接到上下文中,如下图所示: 请注意,如果您在输入的 messages 序列中,传入了reasoning_content,API 会返回 400 错误。因此,请删除 API 响应中的 reasoning_content 字段,再发起 API 请求,方法如访问样例所示。 访问样例​ 下面的代码以 Python 语言为例,展示了如何访问思维链和最终回答,以及如何在多轮对话中进行上下文拼接。

      deepseek推理型 #AI #大模型