- Apr 2025
-
-
Review coordinated by Life Science Editors Foundation Reviewed by: Dr. Angela Andersen, Life Science Editors Foundation & Life Science Editors Potential Conflicts of Interest: None
PUNCHLINE Evo 2 is a biological foundation model trained on 9.3 trillion DNA bases across all domains of life. It predicts the impact of genetic variation—including in noncoding and clinically relevant regions—without requiring task-specific fine-tuning. Evo 2 also generates genome-scale sequences and epigenomic architectures guided by predictive models. By interpreting its internal representations using sparse autoencoders, the model is shown to rediscover known biological features and uncover previously unannotated patterns with potential functional significance. These capabilities establish Evo 2 as a generalist model for prediction, annotation, and biological design.
BACKGROUND A foundation model is a large-scale machine learning model trained on massive and diverse datasets to learn general features that can be reused across tasks. Evo 2 is such a model for genomics: it learns from raw DNA sequence alone—across bacteria, archaea, eukaryotes, and bacteriophage—without explicit labels or training on specific tasks. This enables it to generalize to a wide range of biological questions, including predicting the effects of genetic variants, identifying regulatory elements, and generating genome-scale sequences or chromatin features.
Evo 2 comes in two versions: one with 7 billion parameters (7B) and a larger version with 40 billion parameters (40B). These numbers reflect the number of trainable weights in the model and influence its capacity to learn complex patterns. Both models were trained using a context window of up to 1 million tokens—where each token is a nucleotide—allowing the model to capture long-range dependencies across entire genomic regions.
Evo 2 learns via self-supervised learning, a method in which the model learns to predict masked or missing DNA bases in a sequence. Through this simple but powerful objective, the model discovers statistical patterns that correspond to biological structure and function, without being told what those patterns mean.
QUESTION ADDRESSED Can a large-scale foundation model trained solely on genomic sequences generalize across biological tasks—such as predicting mutational effects, modeling gene regulation, and generating realistic genomic sequences—without supervision or task-specific tuning?
SUMMARY The authors introduce Evo 2, a foundational model for genomics that generalizes across DNA, RNA, and protein tasks. Without seeing any biological labels, Evo 2 learns the sequence rules governing coding and noncoding function, predicts variant effects—including in BRCA1/2 and splicing regions—and generates full-length genomes and epigenome profiles. It also enables epigenome-aware sequence design by coupling sequence generation with predictive models of chromatin accessibility.
To probe what the model has learned internally, the authors use sparse autoencoders (SAEs)—a technique that compresses the model’s internal activations into a smaller set of interpretable features. These features often correspond to known biological elements, but importantly, some appear to capture novel, uncharacterized patterns that do not match existing annotations but are consistently associated with genomic regions of potential functional importance. This combination of rediscovery and novelty makes Evo 2 a uniquely powerful tool for exploring both the known and the unknown genome.
KEY RESULTS Evo 2 trains on vast genomic data using a novel architecture to handle long DNA sequences Figures 1 + S1 Goal: Build a model capable of representing entire genomic regions (up to 1 million bases) from any organism. Outcome: Evo 2 was trained on 9.3 trillion bases using a hybrid convolution-attention architecture (StripedHyena 2). The model achieves long-context recall and strong perplexity scaling with increasing sequence length and model size.
Evo 2 predicts the impact of mutations across DNA, RNA, and protein fitness Figures 2A–J + S2–S3 Goal: Assess whether Evo 2 can identify deleterious mutations without supervision across diverse organisms and molecules. Outcome: Evo 2 assigns lower likelihoods to biologically disruptive mutations—e.g., frameshifts, premature stops, and non-synonymous changes—mirroring evolutionary constraint. Predictions correlate with deep mutational scanning data and gene essentiality assays. Evo 2 embeddings also support highly accurate exon-intron classifiers.
Clarification: “Generalist performance across DNA, RNA, and protein tasks” means that Evo 2 can simultaneously make accurate predictions about the functional impact of genetic variants on transcription, splicing, RNA stability, translation, and protein structure—without being specifically trained on any of these tasks.
Evo 2 achieves state-of-the-art performance in clinical variant effect prediction Figures 3A–I + S4 Goal: Evaluate Evo 2's ability to predict pathogenicity of human genetic variants. Outcome: Evo 2 matches or outperforms specialized models on coding, noncoding, splicing, and indel variants. It accurately classifies BRCA1/2 mutations and generalizes to novel variant types. When paired with supervised classifiers using its embeddings, it achieves state-of-the-art accuracy on BRCA1 variant interpretation.
Evo 2 representations reveal both known and novel biological features through sparse autoencoders Figures 4A–G + S5–S7 Goal: Understand what Evo 2 has learned internally. Outcome: Sparse autoencoders decompose Evo 2’s internal representations into distinct features—many of which align with well-known biological elements such as exon-intron boundaries, transcription factor motifs, protein secondary structure, CRISPR spacers, and mobile elements. Importantly, a subset of features do not correspond to any known annotations, yet appear repeatedly in biologically plausible contexts. These unannotated features may represent novel regulatory sequences, structural motifs, or other functional elements that remain to be characterized experimentally.
Note: Sparse autoencoders are neural networks that reduce high-dimensional representations to a smaller set of features, enforcing sparsity so that each feature ideally captures a distinct biological signal. This approach enables mechanistic insight into what the model “knows” about sequence biology.
Evo 2 generates genome-scale sequences with realistic structure and content Figures 5A–L + S8 Goal: Assess whether Evo 2 can generate complete genome sequences that resemble natural ones. Outcome: Evo 2 successfully generates mitochondrial genomes, minimal bacterial genomes, and yeast chromosomes. These sequences contain realistic coding regions, tRNAs, promoters, and structural features. Predicted proteins fold correctly and recapitulate functional domains.
Evo 2 enables design of DNA with targeted epigenomic features Figures 6A–G + S9 Goal: Use Evo 2 to generate DNA sequences with user-defined chromatin accessibility profiles. Outcome: By coupling Evo 2 with predictors like Enformer and Borzoi, the authors guide generation to match desired ATAC-seq profiles. Using a beam search strategy—where the model explores and ranks multiple possible output sequences—it generates synthetic DNA that encodes specific chromatin accessibility patterns, such as writing “EVO2” in open/closed chromatin space.
STRENGTHS First large-scale, open-source biological foundation model trained across all domains of life
Performs well across variant effect prediction, genome annotation, and generative biology
Demonstrates mechanistic interpretability via sparse autoencoders
Learns both known and novel biological features directly from raw sequence
Unsupervised learning generalizes to clinical and functional genomics
Robust evaluation across species, sequence types, and biological scales
FUTURE WORK & EXPERIMENTAL DIRECTIONS Expand training to include viruses that infect eukaryotic hosts: Evo 2 currently excludes these sequences, in part to reduce potential for misuse and due to their unusual nucleotide structure and compact coding. As a result, Evo 2 performs poorly on eukaryotic viral sequence prediction and generation. Including these genomes could expand its applications in virology and public health.
Empirical validation of novel features: Use CRISPR perturbation, reporter assays, or conservation analysis to test Evo 2-derived features that don’t align with existing annotations.
Targeted mutagenesis: Use Evo 2 to identify high-impact or compensatory variants in disease-linked loci, and validate using genome editing or saturation mutagenesis.
Epigenomic editing: Validate Evo 2-designed sequences for chromatin accessibility using ATAC-seq or synthetic enhancer assays.
Clinical applications: Fine-tune Evo 2 embeddings to improve rare disease variant interpretation or personalized genome annotation.
Synthetic evolution: Explore whether Evo 2 can generate synthetic genomes with tunable ecological or evolutionary features, enabling testing of evolutionary hypotheses.
AUTHORSHIP NOTE This review was drafted with support from ChatGPT (OpenAI) to help organize and articulate key ideas clearly and concisely. I provided detailed prompts, interpretations, and edits to ensure the review reflects an expert understanding of the biology and the paper’s contributions. The final version has been reviewed and approved by me.
FINAL TAKEAWAY Evo 2 is a breakthrough in foundation models for biology—offering accurate prediction, functional annotation, and genome-scale generation, all learned from raw DNA sequence. By capturing universal patterns across life, and identifying both well-characterized and unknown sequence features, Evo 2 opens powerful new directions in evolutionary biology, genomics, and biological design. Its open release invites widespread use and innovation across the life sciences.
-
- Jul 2024
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- May 2024
-
-
could physiology rescue genomics
for - follow up - paper - Could physiology rescue genomics?
to - Could physiology rescue genomics? - https://hyp.is/bslQ-BVVEe-hgF-rrNmjrA/academic.oup.com/biolinnean/article/139/4/357/6604006
-
- Apr 2024
-
archaeologymag.com archaeologymag.com
-
The genomic analysis also challenges earlier theories that suggested hunter-gatherer communities assimilated women from neighboring farming communities.
-
- Nov 2023
-
zoonomiaproject.org zoonomiaproject.orgZoonomia1
- May 2023
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
workflowhub.eu workflowhub.eu
Tags
Annotators
URL
-
- Mar 2023
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Feb 2023
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jan 2023
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Nov 2022
-
en.wikipedia.org en.wikipedia.org
-
ChIP
Workflow 1. Cross-linking 2. Chromatin fragmentation 3. Immunoprecipitation of chromatin 4. DNA recovery and purification 5. Sequencing of DNA
-
analyze protein interactions
ChIP-seq is concerned with testing for protein DNA interactions.
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
When we expanded this analysis to chromosome-wide gene-gene correlations, we discovered a striking ‘X-shaped’ pattern of gene expression covariance (Fig. 1D). Beyond the expected diagonal reflecting coordinated gene expression at the level of operons, the anti-diagonal reflected correlations between genes at a similar distance from the origin of replication, between the “arms” of the circular chromosome, as well as a correlation between genes at the origin and terminus
It would be really interesting to compare the patterns you observe here with the X-like genome inversion patterns that we and others have reported (e.g., see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC16139/). These patterns basically show that the distance a gene is from the origin of replication is conserved but the side of the origin it is on is not.
Those inversion patterns have been seen in some but not all comparisons of closely related bacterial and archaeal genomes. So it seems there are some taxa where the distance a gene is from the origin is not conserved over evolutionary time. It would be interesting to know if these taxa show the X-like patternb you report for gene expression.
-
- Oct 2022
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jun 2022
-
www.biorxiv.org www.biorxiv.org
-
-
Among these, all VUS, variants with a zero-star reviewstatus, i.e., without any detailed review information, and those with conflicting classificationswere excluded
dodgy labs might mark variant as pathogenic and link to assessment criteria but apply them wrongly. These might be detected if other labs post conflicting assessments. But for ultrarare variants, no other lab might have submitted that variant
-
- Apr 2022
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Mar 2022
-
twitter.com twitter.com
-
ReconfigBehSci. (2022, January 10). RT @GeraldGmboowa: Genomic epidemiology of SARS-CoV-2 in Africa focused on different 🌍regions @AfricaCDC https://bit.ly/3tcDuJl. Using A… [Tweet]. @SciBeh. https://twitter.com/SciBeh/status/1480595472834338827
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jan 2022
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Dec 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Oct 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Sep 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jul 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jun 2021
-
osf.io osf.io
-
Singh, Urvashi B., Mercy Rophina, Dr Rama Chaudhry, Vigneshwar Senthivel, Kiran Bala, Rahul C. Bhoyar, Bani Jolly, et al. “Variants of Concern Responsible for SARS-CoV-2 Vaccine Breakthrough Infections from India.” OSF Preprints, June 3, 2021. https://doi.org/10.31219/osf.io/fgd4x.
-
- May 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
science.sciencemag.org science.sciencemag.org
-
Faria, N. R., Mellan, T. A., Whittaker, C., Claro, I. M., Candido, D. da S., Mishra, S., Crispim, M. A. E., Sales, F. C. S., Hawryluk, I., McCrone, J. T., Hulswit, R. J. G., Franco, L. A. M., Ramundo, M. S., Jesus, J. G. de, Andrade, P. S., Coletti, T. M., Ferreira, G. M., Silva, C. A. M., Manuli, E. R., … Sabino, E. C. (2021). Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science. https://doi.org/10.1126/science.abh2644
-
- Apr 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
-
Merrick, J. (2021, April 20). Covid-19 variants: South African strain is causing the most concern for UK scientists. iNews. https://inews.co.uk/news/politics/covid-19-variants-south-african-strain-is-causing-the-most-concern-for-uk-scientists-965679?utm_term=Autofeed&ito=social_itw_theipaper&utm_medium=Social&utm_source=Twitter#Echobox=1618951521
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Feb 2021
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.youtube.com www.youtube.com
-
GDC purpose/summary: https://youtu.be/LY5SkHJplxc?t=118
Tags
Annotators
URL
-
-
www.youtube.com www.youtube.com
-
API endpoint filters are given as % encoded JSON: https://youtu.be/VT-chUoq-oo?t=221
Fields reference for endpoint:files: https://api.gdc.cancer.gov/files/_mapping
Fields reference for endpoint:genes: https://api.gdc.cancer.gov/genes/_mapping
Tags
Annotators
URL
-
- Jan 2021
-
academic.oup.com academic.oup.com
-
northern Europe
New migrations to the UK are being tracked by citizen science projects to see if this is related to climate change http://srs.britishspiders.org.uk/portal/p/Wasp+Spider
-
- Jul 2020
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Jun 2020
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Nov 2019
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Sep 2019
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
-
www.biorxiv.org www.biorxiv.org
-
Transparent Peer Review
Download the complete Review Process [PDF] including:
- reviews
- authors' reply
- editorial decisions
-
- Aug 2019
-
europepmc.org europepmc.org
-
forum
-
- Jul 2019
-
www.plantcell.org www.plantcell.org
-
Check out the peer review report for this article: http://www.plantcell.org/content/plantcell/suppl/2019/07/13/tpc.18.00606.DC2/tpc18.00606.PRR-Griffiths.pdf
Tags
Annotators
URL
-
- Nov 2017
-
www.genome.gov www.genome.gov
-
We do not have to worry about both bases in both strands. Any one strand is fine. Because A always pairs with T and C always pairs with G.
Tags
Annotators
URL
-
- Sep 2017
-
bmcbioinformatics.biomedcentral.com bmcbioinformatics.biomedcentral.com
-
The projection score - an evaluation criterion for variable subset selection in PCA visualization
"variable" typically means gene or locus in the context of biological data.
-
- Apr 2017
-
science.sciencemag.org science.sciencemag.org
-
The Administration could also exercise its regulatory authority—most potently, to direct the Centers for Medicare and Medicaid Services (CMS) to allow reimbursement for molecular profiling of cancers
Perhaps the most important measure to keep precision medicine initiate alive. Surge in risk and treatment response prediction in genomic assays is of little value without practical means of affordable molecular profiling of a patient's tumor or more importantly, pre-diagnosis genomic screen.
-
- Mar 2017
-
gigadb.org gigadb.org
-
notorious
National Geographic and others have dubbed it "Fishzilla" http://natgeotv.com/asia/fishzilla/about
Tags
Annotators
URL
-
-
med.stanford.edu med.stanford.edu
-
Genome Sequence Archive (GSA)
Database URL is here: http://gsa.big.ac.cn/
Note: metadata is INSDC format, but this database isn't part of the INSDC, so you'll still need to submit your data to one of those databases to meet internationally recognised mandates
-
-
gigadb.org gigadb.org
-
wisent, also known as European bison
According to BMC Biology, 2016 was "the year of the Wisent"dx.doi.org/10.1186/s12915-016-0329-3
Tags
Annotators
URL
-
-
www.encodeproject.org www.encodeproject.org
-
The ENCODE portal is the official canonical source for ENCODE data and data from other related projects.
Tags
Annotators
URL
-
- Oct 2016
-
biocontainers.pro biocontainers.pro
-
blast
BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
-
- Jun 2016
-
gigadb.org gigadb.org
-
Additional information:
More information also in this blog interview with the first author: http://blogs.biomedcentral.com/gigablog/2016/06/08/introducing-gigwa-genotype-investigator-genome-wide-analyses/
-
- Apr 2016
-
gigadb.org gigadb.org
-
http://galaxy.cbiit.cuhk.edu.hk/
This has now migrated to: http://gigagalaxy.net/
-
IsPreviousVersionOf doi:10.5524/100148
This paper was studied in a case study that lead to some corrections. See the paper here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
-
-
biosharing.org biosharing.org
-
Published in 2015
These policies have been built on the Fort Lauderdale principles, see the original policies from 2003 https://www.genome.gov/10506537/
-
-
gigadb.org gigadb.org
-
crab-eating macaque
This was sequenced at the same time and is available from http://dx.doi.org/10.5524/100003
Tags
Annotators
URL
-
-
gigadb.org gigadb.org
-
Chinese rhesus
This was sequenced at the same time and is available as dataset http://dx.doi.org/10.5524/100002
Tags
Annotators
URL
-
-
gigadb.org gigadb.org
-
To maximize its utility
The unusual data released strategy involving crowdsourcing on twitter, is discussed in more detail in this blog http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/
Tags
Annotators
URL
-
-
gigadb.org gigadb.org
-
Workflow, Virtual-Machine
The dockerised workflows are discussed in more detail in this blog posting here: http://blogs.biomedcentral.com/gigablog/2015/07/30/fermenting-reproducible-research-revolution/
Tags
Annotators
URL
-
-
gigadb.org gigadb.org
-
Related manuscripts:
See also this population genomics study in Nature Genetics that uses this data: http://www.nature.com/ng/journal/v45/n1/full/ng.2494.html See also this blog posting on data citation of this data (and related problems): http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/
Tags
Annotators
URL
-
-
www.nature.com www.nature.com
-
Accession codes
The panda and polar bear datasets should have been included in the data section rather than hidden in the URLs section. Production removed the DOIs and used (now dead) URLs instead, but for the working links and insight see the following blog: http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/
-
-
gigadb.org gigadb.org
-
doi:10.1016/j.cell.2014.03.054
More on the backstory and other papers using and citing this data before the Cell publication in ths blog posting: http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
Tags
Annotators
URL
-
-
www.isb-sib.ch www.isb-sib.ch
-
Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.
This is going to be a good talk. Get your coffee, open your eyes, and open your mind! A pattern that could actually scale up - worth a try! Disagree? reply here.
-
-
gigadb.org gigadb.org
-
genomically poorly explored
Poorly studied in some part to its very high degree of heterozygosity. You can get some more insight here, where it was nominated as a "top ten genome" http://www.homolog.us/blogs/blog/2015/05/08/top-ten-genomes-ix-pacific-oyster/
Tags
Annotators
URL
-
-
gigadb.org gigadb.org
-
Long Fragment Read technology
See Nature doi:10.1038/nature11236 for more on how LFR works: http://www.nature.com/nature/journal/v487/n7406/full/nature11236.html
Tags
Annotators
URL
-