715 Matching Annotations
  1. Aug 2021
    1. Background

      This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Chao Bian

      1. Is the language of sufficient quality? No

      2. Are all data available and do they match the descriptions in the paper? Yes

      3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes

      4. Is the data acquisition clear, complete and methodologically sound? Yes

      5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

      6. Is there sufficient data validation and statistical analyses of data quality? Yes

      7. Is the validation suitable for this type of data? Yes

      8. Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

    2. Abstract

      This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Mile Šikić

      1. Is the language of sufficient quality? Yes

      2. Are all data available and do they match the descriptions in the paper? Yes

      3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes

      4. Is the data acquisition clear, complete and methodologically sound? Yes

      5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

      6. Is there sufficient data validation and statistical analyses of data quality? Yes

      7. Is the validation suitable for this type of data? Yes

      8. Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

      Additional Comments: In their update to the previous study on the comparison of long read technologies for sequencing and assembly of plant genomes, Sharma et al. presented a follow-up analysis using a newer generation of base callers for nanopore reads and PacBio HiFi reads. I argue that this study is an important update, but it is not suitable for publication in the current form.

      My major comments are the following:

      1. It is not clear which version of the base caller the authors used in assemblies related to Table 1 and Table 3.
      2. For phased assemblies, it is important to provide information about the size of alternative contigs
      3. In Table 1, it would be great to have results for methods that do not phase assembly (i.e. Flye).
      4. There is no explanation why authors use IPA instead of other HiFi assemblers, i.e. hifiasm, which from my experience, perform better than IPA.
      5. A sentence related to Table 3, “The quality of the assemblies was more contiguous with less data in each of these cases when HiFi reads were used instead of the earlier continuous long reads (Table 3).” is not clear. Following Table 3, assemblies achieved using long reads have similar or longer N50 and higher BUSCO score. Also, it is not clear which assembler was used for long reads.
    1. Abstract

      Reviewer 1. Wei Zhao Are all data available and do they match the descriptions in the paper? No

      The BioProject PRJNA667278 is currently not accessible.

      Is there sufficient data validation and statistical analyses of data quality? No

      The size of the final genome assembly is significantly larger than the estimated size, which is indicative of redundancy. I would suggest removing the potential haplotype redundancy further. I would also suggest a k-mer analysis to validate the genome size. For a chromosomal assembly, the ratio of properly paired reads is lower than expected.

      Additional comments annotated on the paper have been provided to the author.

      Major Revision

    2. Now published in Gigabyte doi: 10.46471/gigabyte.10

      Reviewer 2. Ramil Mauleon Are all data available and do they match the descriptions in the paper? No Additional Comments Bioproject PRJNA667278 in NCBI appears to be still embargoed, a reviewer link would be helpful.

      Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide No Additional Comments Sample provenance / passport information is lacking for the Cannbio-2 material. Outright mention of the source of RNAseq +TSA info in the methods would be helpful. Same comment as above for Genbank bioproject.

      Is the data acquisition clear, complete and methodologically sound? No Additional Comments It's mostly clear from the DNA extraction, pacbio sequencing and primary assembly. The anchoring of the assembled contigs into pseudochromosomes using another published genome lack detail and only broadly mention the software used (RaGOO). This is a very critical step that will distinguish if the Cannbio-2 assembly is an improvement vs the mentioned genome assemblies (esp. cs10, PK); it's a circular argument if the genome assembly is ascertained against existing assemblies from other cannabis accessions and declared improved. As noted by the authors, there are differences (rather than inconsistencies) between the compared published genomes, and these may be inherent in each genome; any analyses on an assembly based on these would cause ascertainment bias. Is there sufficient detail in the methods and data-processing steps to allow reproduction? No Additional Comments The previous comment regarding anchoring of contigs to an existing genome applies to this as well. Regarding genome annotation, is there any basis for the choice of annotation method, i.e. annotator software (Augustus), the consensus builder (EVN), and PASA ? MAKER (MAKER-P) and BRAKER are available pipelines, both being reported as good for plants, and GeneMark is a prediction software suite that excels in plant genome annotation. Re, evidences for annotation, it appears that transcript de novo assemblies were used, but the RNAseq data was not incorporated in the prediction step. No orthologous protein databases appear to have been used as hints for gene prediction. These are just observations/suggestions to further improve annotation quickly. In general, the annotation steps would benefit from a bit more detail for reproducibility, but I would say the annotation if done at the contig level would be very solid.

      Is there sufficient data validation and statistical analyses of data quality? No Additional Comments On the assembly itself, since there was no mention of the method for anchoring contigs into chromosomes, there is no information on how scaffolds are spaced along the genome, is it padding by a fixed # Ns? Are all assembled contigs anchored or are there unanchored ones? Again on the point of anchoring and ordering of contigs, ideally evidence from the same sequenced material would be the best to use (an example - genetic linkage map with sequence-based markers). Plant genomes are notorious for rearrangements (inversions, insertions, translocations, tandem repeats etc) even within species, and this appears to be the weakest evidence in this paper (how the contigs were anchored into chromosomes). Re gene annotation, you can conduct the BUSCO on the predicted genes and report those as well. Again, results will reflect the outcome of the annotation method used. For BUSCO in general, I'd be cautious in comparing results across published genomes and it would be more informative during an optimization of the assembly methodology or testing different assembly methods (checking whether you are improving the assembly of the same underlying dataset). On this same topic, are the unmapped contigs from other assemblies used? The same question with the assembly done by the authors apply.

      Is the validation suitable for this type of data? No Additional Comments Mostly yes for the primary genome assembly. The pseudochromosome assembly analysis data validation is not convincing. If done at the contig level, the genome annotation would be solid.

      Is there sufficient information for others to reuse this dataset or integrate it with other data? No Additional Comments Recapping, missing are the biomaterial information,information on pseudochromosome assembly, explicit mention of genbank IDs for transcript assembly and RNAseq data used in annotation (instead of being in the reference) would improve re-use and integration. On the chromosome nomenclature, I don't understand why the author doesn't mention the ongoing nomenclature being used by the community as reported in the NCBI cs10 refseq release.

      Any Additional Overall Comments to the Author I believe reporting on results based on the main evidences generated by the authors (in this current work and the previous one on transcriptome) would make this a stronger data release, i.e. contig/scaffold assemblies, the annotation of that based on your own RNAseq data . On a related note, have you tried using your short-reads data during assembly? Could your assembly have been improved if you used the Illumina data during assembly itself (hybrid assembly, scaffolding)? Cannabis genomes are known to be highly heterozygous, a report of this would be easy to conduct from your assembly vs your reads dataset especially the short-reads and would be an important finding.

      Recommendation Major Revision

    1. Bone mass loss

      Reviewer 1. Levi Waldron Wang et al. present a shotgun metagenomics cross-sectional study of fecal specimens from 361 elderly women with the primary objective of identifying correlations between bone mass density and microbial taxa. The methods are reasonable and I have no major concerns about this manuscript, only some moderate suggestions to improve reporting and discussion.

      For items answered “Yes” it would help to provide line numbers in the manuscript, as done for some but not all checklist items.

      3.0 Participants:

      It’s stated that “Fecal samples of 361 post-menopause women were randomly collected at the People’s Hospital of Shenzhen” – I suspect the correct word here is “arbitrarily” rather than “randomly”, unless a random number generator was used to select a random sample of all eligible patients. Some statement of how the women were recruited and how representative they are of all patients at the hospital is warranted. E.g. were they recruited from emergency room, a cancer ward, all outpatients, all admitted patients, etc? See also later comment about generalizability.

      4.9 Batch Effects:

      This is left “NA” – can the authors at least comment (in the manuscript) on the potential for batch effects affecting cases and controls differently – ie were they all prepared together or in separate libraries, and were they sequenced in the same runs or completely separated?

      8.0 Reproducible research:

      I appreciate that data have been posted at EBI and CNGB. Could the authors also comment on whether the metadata essential to the analysis are also provided, and that these can be linked to the sequence data? Although I’m glad to hear that “Others could reproduce the reported analysis from clean reads by the declared software and parameters” I do think that the code to reproduce the analysis should also be reported.

      8.1 Raw data access

      The checklist states “no raw reads for ethical” but the manuscript states “The sequencing reads from each sequencing library have been deposited at EBI with the accession number: PRJNA530339 and the China National Genebank (CNGB), accession number CNP0000398.” so there is a disconnect. Assuming human sequence reads are removed from the data, I’m not convinced of ethical reasons not to post microbial sequence reads, but it seems the authors have posted the microbial sequence reads.

      10.1 – 10.5 Taxonomy, differential abundance, other analysis, other data types, and other statistical analysis are all blank. Some should be “N/A” but others just seem to be overlooked.

      13.2 Generalizability: I think this is an important element to include in the discussion. How typical are your volunteers of all women that age?

      Minor:

      “Making these data potentially useful in studying the role the gut microbiota might play in bone mass loss and offering exploration into the bone mass loss process.” -> These data are potentially useful in studying the role the gut microbiota might play in bone mass loss and in exploring the bone mass loss process.

      The manuscript is well written, but there are a few other places that would benefit from some copy editing.

    2. Abstract

      Reviewer 2. Christopher Hunter Is the language of sufficient quality?

      Yes.

      Is the data all available and does it match the descriptions in the paper?

      No.

      Most of the data are provided as supplemental files in biorXiv, but in Excel rather than CSV. These data files will need to be curated into a GigaDB dataset.

      Is the data and metadata consistent with relevant minimum information or reporting standards?

      Yes.

      Is the data acquisition clear, complete and methodologically sound?

      No.

      Comment. The consent by the patients to openly share all metadata is not clearly stated, simply saying the study was approved by the bioethics review board does not mean consent was given to share the data, just that the institute consent to the study being done.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      No.

      Comments: Maybe to someone with a good understanding of statistics there is sufficient detail, this is an area that a statistician should look at. For me, the descriptions of the analysis and the methods do not given anywhere near enough detail for me to either understand what was done or replicate it. The concept of "Gut metabolic modules" is not defined here, with just a reference to another paper, a brief explanation of what is meant by the term here would be useful.

      Is there sufficient data validation and statistical analyses of data quality?

      Yes.

      Comments. The sequences were filtered for human contaminants and adapter seq, also low quality reads were removed.

      Is the validation suitable for this type of data?

      No.

      Comments: The metadata is extensive but there are some basic points that are missing; collection date, antibiotic use, relatedness of samples/patients. Other less important details are also missing, like why and how this cohort was selected.

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      Yes

      Any Additional Overall Comments to the Author

      Yes

      • I am concerned about the open sharing of patient metadata without the evidence that it was consented prior to sharing. - A lot of metadata is collected and provided in the supplemental tables (which is great for reuse) but there are no explanations of what the values are, while some headers are self explanatory others less-so e.g. what is CROSSL(pg/ml)? or "Side crops", - how were the various conditions diagnosed? - I see no indication of antibiotic usage in the cohort - Are all the samples from different individuals? was each sample a single bowl movement? - There is no background given as to how this cohort was selected or why. - The is no discussion of the bone mass density of a "normal" cohort, does this cohort represent a normal cohort or is it already biased toward low or high density? Simply describing the cohort with respect to Normal (T of -1 or above), low (-1to-2.5) or osteoporosis (< -2.5) would be a help. I cannot see the T-scores included in the sTab1a file, are they computed from the L1-L4(z) values given? - There are a number of NA values in the table of samples metadata, but there is no explanation as to how these samples where handled in the analysis. - In general I feel that there is a lot of poorly described statistical analyses included that are not required as part of a data note, the focus should be on describing the data and ensuring the data and metadata are well explained.
    3. certified by peer review

      This work has now been published in GigaByte here: https://doi.org/10.46471/gigabyte.12

    1. Now published in Gigabyte doi: 10.46471/gigabyte.9

      Reviewer 2. Levi Waldron Chen et al. use 16S amplicon metagenomic sequencing to investigate urinary bacterial communities and their correlation to lifestyle and clinical factors, and reproductive tract (cervix, uterine cavity, vagina) microbiota in a cross-sectional study of 147 Chinese women of reproductive age. This is an important but challenging study, because of the threat of microbial contamination in low microbial biomass specimens such as the upper reproductive tract and urine.

      Checklist item 4.0

      The laboratory /center where laboratory work was done is not actually stated in lines 121-133.

      Negative controls and contamination

      Negative controls were generated for the 10 women undergoing surgery through as sterile saline collected through the urine catheter. I assume this was done after the catheter was used for urine collection, but this should be stated.

      No negative controls were used for the self-collected urine specimens. However it seems likely that mid-stream self-collection would be more prone to contamination than catheter sampling by a doctor during surgery. Some possibilities for negative controls in this setting exist, such as including a sample of sterile saline with the self-collection kit and asking participants to fill another vial with it immediately following urine collection. The lack of negative controls for self- collected specimens should be stated as a limitation.

      The authors identify the risk of contamination from vulvovaginal region (lines 192-193) but not of cross-contamination. Discussion of the risk of cross-contamination during collection and subsequent processing, steps to mitigate and identify it, and comparison of results to bacterial taxa identified as common contaminants (e.g. Eisenhofer et al, PMID 30497919), is warranted.

      Comparability of urine sampling methods

      Since no specimens were collected by both self-collection and catheter sampling during surgery, there is no way to directly assess the accuracy of self-collection using catheter as a “gold standard” This should be stated as a limitation.

      I could not find an analysis comparing the microbial composition of the catheter-collected and self-collected specimens. Some analysis comparing the two could help address the quality of self-collected specimens lacking negative controls.

      Discussion

      The authors do not include overall interpretation or limitations in the Discussion, saying under checklist items 12.0, 12.1, 13.0 “The discussion was suggested to focus on the potential uses according to the article format.” I think the editors should clarify to authors where these key discussion points belong. I think no article is complete without some discussion of limitations; see above for limitations noted of this study.

      Checklist item 13.2 Generalizability

      Authors state “The generalizability of the study is to women of reproductive age, and is shown in line 236-237” but on these lines I see description of statistical methods. This does deserve some discussion though, because the sample includes only women who underwent hysteroscopy and/or laparoscopy for conditions without infections, and has a number of exclusion criteria. This cannot be a representative cross-section of all women of reproductive age, so some discussion of how this sample may be different or similar to the population of all women of reproductive age is warranted. If the authors claim this sample should be generalizable to all women of reproductive age, that should be stated along with the intentional restrictions of the sampling and rationale of why these criteria are not expected to have any impact on the microbiota sampled.

      Clustering of patients

      Lines 212-213: cutting a hierarchical clustering into discrete groups can be done for any dataset, and without some analysis such as Prediction Strength (Tibshirani and Walther, J. Comput. Graph. Stat. 14, 511–528 (2005)) or another measure of cluster validation, this isn’t evidence of distinct patient groups and that should be stated clearly. It is OK to use the grouping to discuss general trends as long as care is made not to imply these are distinct patient subsets without further analysis. I am cautious about this because distinct subsets are intuitively appealing to many readers and the existence of distinct subsets can be harder to correct than to claim.

      Minor

      Line 241 “As the large-scale cohort” -> As a large-scale cohort

    2. Abstract

      Reviewer 1. Christopher Hunter Is the language of sufficient quality?

      Yes.

      Is the data all available and does it match the descriptions in the paper?

      No.

      Comment: line 96-97 "In this study, a total of 147 reproductive age women (age 22-48) were recruited by Peking University Shenzhen Hospital (Supplementary Table 1)." B utSup. table 1 has only 137 samples. Revise text to explain only 137 samples were used for the main analysis, with the 10 extra for validation. Line 103 -104 "None of the subjects received any hormone treatments, antibiotics or vaginal medications within a month of sampling." Sup Table 1 has a column for "Antibiotic use True/False", 41 samples have "T"? this needs explaining. Its possible the spreadsheet True is referring to a longer time period, but thats not explained anywhere. line 110-112 "The samples from an additional 10 women were collected for validation purposes by a doctor during the surgery in July 2017." Where are these metadata? they are not included in Sup table 1. The data presented and discussed in "additional-findings.docx" are not included in the data files (yet), these should either be removed (as not included in the main article), or expand upon the methods (to include negative control details) and add this to main text.

      Is the data and metadata consistent with relevant minimum information or reporting standards?

      Yes.

      Comment. The supplemental tables need some better legends/descriptions to help readers understand what data is in them.

      Is the data acquisition clear, complete and methodologically sound?

      Yes.

      Comment. The wet and bioinformatics methods could benefit from being included in protocols.io

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      Yes

      Is there sufficient data validation and statistical analyses of data quality?

      Yes

      Is the validation suitable for this type of data?

      Yes

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      Yes

      Any Additional Overall Comments to the Author

      Yes

      The Figure appear to be mixed up, what’s displayed as Figure 1 in the manuscript appears to relate to the legend given for Figure 2, Figure 2 relates to legend of Figure 3, and Figure 3 relates to the legend of Fig 1!!! line 69 -Chen et al. no citation number link provided line 74 -Thomas-White et al. (2018) no citation number link provided line 79 -Gottschick et al. (2017) no citation number link provided line 246-248 "The initial results here indicate a close link between the urinary microbiota with the general and diseased physiological conditions,... " As this study is looking at "Healthy" individuals I do not believe there is sufficient evidence to back up this statement about the "diseased" physiological conditions. line 274-275 "The sequences of bacterial isolates have been deposited in the European Nucleotide Archive with the accession number PRJEB36743" this accession is not public so I am unable to see whats included here. If available we would like to see the Real-Time PCR Data from the experiments made available in Real-Time PCR Data Markup Language (RDML). The additional cohort of 10 women is almost a different study, it didn't have the same 16s RNA amplicon sequencing done, and was only a validation that some live bacteria can be cultured from urine in a small number of cases (3/10). If it is to be included table S5 should be updated to include the specific INSDC accessions for the submitted sequences. (title of Table S5 in file is currently saying Table 1).

    1. Now published in Gigabyte doi: 10.46471/gigabyte.8

      Reviewer #1 : Review MS by Wei Zhao Data Release Checklist Reviewer name and names of any other individual's who aided in reviewer Wei Zhao

      Is the language of sufficient quality? Yes Please add additional comments on language quality to clarify if needed<br> Are all data available and do they match the descriptions in the paper? Yes Additional Comments<br> Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes Additional Comments<br> Is the data acquisition clear, complete and methodologically sound? Yes Additional Comments<br> Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes Additional Comments See attached PDF file Is there sufficient data validation and statistical analyses of data quality? No Additional Comments Check and filter potential contamination of the raw assembly. Is the validation suitable for this type of data? Yes Additional Comments But maybe no, see attached pdf Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes Additional comments annotated on the paper and shared with the authors. Recommendation Major Revision

      Reviewer #2 : Review MS by Daniel Lang Data Release Checklist Reviewer name and names of any other individual's who aided in reviewer Daniel Lang

      Is the language of sufficient quality? Yes Please add additional comments on language quality to clarify if needed<br> Are all data available and do they match the descriptions in the paper? Yes Additional Comments<br> Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes Additional Comments<br> Is the data acquisition clear, complete and methodologically sound? Yes Additional Comments<br> Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes Additional Comments<br> Is there sufficient data validation and statistical analyses of data quality? Yes Additional Comments There is a exceptionally high number of scaffolds for 10x, a bad BUSCO and a discrepancy between kmer <-> fcm&assembly size that is unusual. That would have been worthy of discussion. Is the validation suitable for this type of data? Yes Additional Comments<br> Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes Additional Comments<br> Any Additional Overall Comments to the Author<br> Recommendation Accept

    1. Now published in Gigabyte doi: 10.46471/gigabyte.7

      This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Qiye Li Since I am unable to access the data submitted to NCBI or GigaDB, I cannot judge this issue currently. Please make sure that the gene annotation, repeat annotation, transcriptome assembly, gene expression matrix, and genetic variant data have been uploaded somewhere in addition to the raw reads and genome assembly.

      While the bioinformatic tools used in all the steps are indicated clearly, the parameters for many tools are not defined.

      What is the gap ratio (i.e. % of unclosed gaps or Ns) of the genome assembly? As I know, the raw Supernova assembly may have a high proportion of gaps, although the scaffold N50 is pretty good. Additional gap closer steps (e.g. using GapCloser, RRID:SCR_015026) would improve the completeness of the assembly.

      BUSCO analysis is competent to access the completeness of the protein-coding gene space of the genome assembly. But a good BUSCO score does not necessarily mean good assembly completeness. Another conventional way to demonstrate the completeness of the assembly is to show the metrics of DNA read mapping, such as the overall mapping rate, % in proper pair, % of covered bases, etc.

      How is the completeness of the gene set generated by the Fgenesh++ pipeline? I suggest that the authors provide BUSCO score for the Fgenesh++ gene set as they did for the transcriptome assembly.

      Methods related to Alzheimer’s Genes Analysis: The methods used to identify the Alzheimer’s disease (AD) related human genes in antechinus seem to be flawed, as the authors only performed unidirectional searches for homologs in the antechinus gene set. I think the authors should identify bona fide orthologs of these AD-related genes in antechinus. The conventional way to determine orthologs between two species is based on a reciprocal best hit (RBH) strategy (i.e. RBHs between the human and antechinus gene sets).

      Reviewer 2: Walter Wolfsberger PRJNA664282 accession number is not found on NCBI. Is it scheduled to be released with the publication?

      Appropriate tools were used for appropriate analyses. The Y chromosome identification approach seems sound.

      The bioinformatic approaches the authors tools are sound, with the right tools and approaches to the analysis.

      The prep-print is well worded and easy to understand and follow. It provides good amount of context, that justifies the extra analyses done in the publication. The assembly quality is adequate, with relatively low N50, but good completeness scores, given that mammalian genomes have higher levels of low complexity\repetitive content. The metrics presented adhere to the scope of GigaByte, and the data itself is valuable to the scientific community.

    1. Genome sequencing

      Reviewer 2: Mahul Chakraborty

      Reviewer Comments to Author: In "Two high-quality de novo genomes from single ethanol-preserved specimens of tiny metazoans (Collembola)." Schneider et al. described de novo genome assemblies of two tiny field collected Collembolan specimens. The authors collected high quality genomic DNA from the specimens following a Pacfiic Biosciences recommended protocol for ultra low input library, amplified them, and generated adequate sequence coverage to generate contiguous assemblies. This is a significant step forward in generating de novo genome assemblies from small amounts of tissues and cells and therefore will be a useful guide for not only people who are studying whole organisms but also people who are studying variation between cell or tissue types within an individual. I have some minor comments: "They were preserved in 96% ethanol, kept at ambient-temperature for one day until they would be stored at -20°C for 1.5 months, until DNA extraction." - Was the preservation at -20 a deliberate step to see the effect of this treatment on sequencing or just a conscious choice for specimen preservation? The specific conditions used (e.g. the time and speed of centrifuge) for the g-Tube shearing needs to be added in the Methods. "Circularity was validated manually, and nucleotide bases were called with a 75% threshold Consensus.?" - please clarify what the 75% threshold consensus is. "We then performed another estimation of the genome size by dividing the number of mapped nucleotides by mode of the coverage distribution" - Why was this done? Did the authors suspect the Genomescope estimate to be incorrect? "We compared our new genomes sequenced to previous Collembola assemblies that were generated with long read and sometimes additional short read data." - This statement needs citations for the previous Collembola assemblies. The authors used blastn and megablast to search the beta-lactams synthesis genes in the new assembly. Tblastx might be more appropriate. "For D. tigrina a total of 20,22 Gb HiFi data (Q>=20) was generated," - Do you mean 20.22 ? "For S. aquaticus a total of Gb HiFi data (Q>=20) was generated" - missing the number before Gb The authors report only one assembly from hifiasm, which I presume is the primary assembly. Given that the authors assembled diploid individuals, I am curious whether hifiasm assembled the alternate haplotype sequences. "The insect genomes have higher BUSCO scores (96.5 and 99.6%), but lower contiguity (Table 2, Fig. 3)."

      • This statement is incorrect. A number of insect genomes are more contiguous than the assemblies presented here, including Drosophila melanogaster (PMID: 31653862) and several other Drosophila species, Anopheles stephensi (DOI:10.1101/2020.05.24.113019), Anopheles albimanus (PMID: 32883756)
    2. ABSTRACT

      Reviewer 1. Arong Luo

      Reviewer Comments to Author: First, I'd like to commend the authors on attempting to sequence whole genomes of tiny metazoans, which account for a large part of biodiversity in nature and yet are difficult to be sequenced. Second, I am impressed by their ethanol-preserved specimens, which thus make genome sequencing more applicable and attractive in practice. We must admit that sometimes we cannot use fresh specimens directly for genome sequencing. Thus, I think this manuscript is really of scientific significance for specific fields such as insects. I found that the focal part of their sequencing protocol is the "whole genome amplification-based Ultra-Low DNA Input Workflow for SMRT Sequencing (PacBio)" throughout the text, which of course is very complex. So, I suggest the authors provide a flowchart showing critical or main steps during their workflow, and the readers can then understand easily and refer to their workflow in future projects. Finer points: Line 35: I suggest providing specific/important information for the 'novel' protocol herein. Line119-120: Are the specimens later for DNA extraction also morphologically identified? Line130-131: The DNA extract was selected randomly or based on certain measurements? Line 393: delete the dot '.'

    1. Gigabyte doi: 10.46471/gigabyte.6

      Reviewer 2. Yunyun Lv

      Do you understand and agree to our policy of having open and named reviews, and having the your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes

      Is the language of sufficient quality? Yes

      Are all data available and do they match the descriptions in the paper? No

      Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide No

      Is the data acquisition clear, complete and methodologically sound? Yes

      Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

      Is there sufficient data validation and statistical analyses of data quality? Yes

      Is the validation suitable for this type of data? Yes

      Is there sufficient information for others to reuse this dataset or integrate it with other data? No

      Any Additional Overall Comments to the Author This study presents a chromosome-level genome assembly of common dragonet. Hi-C method was applied to generate the high-quality genomic assembly. The result is valuable for further genomic analysis. However, some basic question should be solved or answered in the article to give a clearer insight.

      Line 35 findings section: The annotated total gene number and their quality should be evaluated and presented in the findings section. Line73-Line75:This sentence contains much speculation. I feel it should be removed or just mention the sympatry of their living location. Line 220: The section mainly described the method of gene annotation, however, the corresponding result is absent. These results are important to perform the various comparative genomic analysis. Thus, a detailed description of gene annotation result should be required in the revision. Line 238: Availability of supporting data; I searched the project accession number in NCBI database, but found no result. Thus, the supporting data is not unavailable in current.

      Line 33,type error: “syngnatiforms” should be syngnatiformes

      Recommendation Major Revision

    2. Now published

      Reviewer 1. Chao Bian

      Do you understand and agree to our policy of having open and named reviews, and having the your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.)

      Yes

      Is the language of sufficient quality? Yes

      Are all data available and do they match the descriptions in the paper? Yes

      Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes

      Is the data acquisition clear, complete and methodologically sound? Yes

      Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

      Is there sufficient data validation and statistical analyses of data quality? Yes

      Is the validation suitable for this type of data? Yes

      Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

      Any Additional Overall Comments to the Author?

      This paper, entitled ‘Chromosome-level genome assembly of a benthic associated Syngnathiformes species: the common dragonet, Callionymus lyra’, has provided a reference genome of the common dragonet with a high contig and scaffold N50 values. The genome size estimation, gene and repeat annotation were also performed in this study. The analysis approaches, such as genome assembling, annotation, are solid and well performed.

      However, for the gene annotation, there was no homology-based annotation for gene annotation. On the other hand, why the authors have not used the HISAT or Tophat to map the RNA reads onto genome to predict the gene structure. I really rarely see the transcriptome annotation by using the trinity assembly.

      In addition, I still consider that the first published genome should have at least one analysis point for illuminating the molecular mechanism of the special character of this species. Only an assembly and some genes will largely reduce the impacts and interests for this fascinating fish species.

      Some minor mistakes should be changed: The decimal place through whole paper should be uniformed. Line 41, 538 Mbp should be 538.0 Mbp. Line 45, 27.66% should be 27.7%. Line 76, change “suggest” to “suggests”. Line 83 and line 94, for “see [9]” and “by [10]”, the author’s name should be indicated in text, like “see XX’s study [9]”. Line 104, tissue should be tissues. Line 120 and line 131, change ‘562’ to ‘562.0’, and change ‘645’ to ‘645.0’. Line 156, explains should be explain.

      Recommendation

      Major Revision

    1. Now published in GigaScience doi: 10.1093/gigascience/giaa079

      This work has been peer reviewed in GigaScience, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Jing Zhao http://dx.doi.org/10.5524/REVIEW.102331 Reviewer 2: Emre Guney http://dx.doi.org/10.5524/REVIEW.102332

    1. Now published in GigaScience doi: 10.1093/gigascience/giab042

      This work has been peer reviewed in GigaScience, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Karen Ross http://dx.doi.org/10.5524/REVIEW.102747 Reviewer 2: Carlos P. Cantalapiedra http://dx.doi.org/10.5524/REVIEW.102749

    1. Now published in Gigabyte doi: 10.46471/gigabyte.2 Qiye Li 1BGI-Shenzhen, Shenzhen 518083, China2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Qiye LiQunfei Guo 1BGI-Shenzhen, Shenzhen 518083, China3College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteYang Zhou 1BGI-Shenzhen, Shenzhen 518083, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Yang ZhouHuishuang Tan 1BGI-Shenzhen, Shenzhen 518083, China4Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteTerry Bertozzi 5South Australian Museum, North Terrace, Adelaide 5000, Australia6School of Biological Sciences, University of Adelaide, North Terrace, Adelaide 5005, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Terry BertozziYuanzhen Zhu 1BGI-Shenzhen, Shenzhen 518083, China7School of Basic Medicine, Qingdao University, Qingdao 266071, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJi Li 2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China8China National Genebank, BGI-Shenzhen, Shenzhen 518120, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteStephen Donnellan 5South Australian Museum, North Terrace, Adelaide 5000, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Stephen DonnellanGuojie Zhang 2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China8China National Genebank, BGI-Shenzhen, Shenzhen 518120, China9Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650223, Kunming, China10Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, DenmarkFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Guojie ZhangFor correspondence: guojie.zhang@bio.ku.dk

      This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Review 1. Walter Wolfsberger Is the language of sufficient quality? Yes.

      Is the data all available and does it match the descriptions in the paper? Yes.

      Is the data and metadata consistent with relevant minimum information or reporting standards?

      Comment: The accession number for GigaDB provided in the paper does not yield any results in the GigaDB search. Using the species name works though.

      Is the data acquisition clear, complete and methodologically sound?

      Comment: Although it is clear in the paper that a significant portion of data was discarded during the early QC step, there is no indication of the reason for it, or the nature of the problem that was encountered. For total in the paper, the research group produced 396 Gb of raw sequence(211 Short insert and 185 long insert libraries) out of which only 180(130 Gb Short insert and never mentioned 55Gb Long insert) were used later on for the assembly. Upon a single library FastQC analysis I have encountered extreme levels of sequence duplication that might indicate the libraries were not diverse or there was a PCR-artifact(like overamplification), that might have lead to this low-quality initial data. The parameters for tool SoapNuke, used in early QC are not defined.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      Is there sufficient data validation and statistical analyses of data quality? Yes.

      Is the validation suitable for this type of data?

      Comments: The assembly followed a logical order, with appropriate tools used at every step.

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      Comment: Although the resulting assembly was of moderate quality(highly fragmented, but good BUSCO score), a randomly picked library showed a really high duplication rates for sequencing, which indicates that there might be problems for future data reuse. Addressing these issues or at least acknowledging them would benefit the whole report and the dateset.

      Additional Comments:

      I don't think physical coverage is used widely in genome assembly as of now, as given the mate-pair reads nature - it inflates this statistics. I would put the resulting assembly statistics in a table, including all of the metrics(N50, N of Contigs, N of Scaffolds, Average Contig length and etc.) adding BUSCO score to the table, as the current formatting is not readable.  

      Review 2. Nandita Mullapudi Is the language of sufficient quality? Yes.

      Is the data all available and does it match the descriptions in the paper? Yes.

      Is the data and metadata consistent with relevant minimum information or reporting standards?

      Comment: I am unaware of defined reporting standards for assembly reports, however, all sample preparation, data generation and analysis methods have been described in adequate amount of detail.

      Is the data acquisition clear, complete and methodologically sound? Yes.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      Comment: Following additional details would help to enable reproduction: (1) Parameters used for data pre-processing using SOAPnuke, as well as related adapter sequences etc. These would be necessary to reproduce the data clean up step. (2) Memory, processor and time details of computational resource used for assembly (3) Was Platanus assembly attempted using different parameters, how were the parameters reported in the paper arrived at? (4) For gene prediction, several vertebrate sequences were used, the details/source of these reference sequences are missing.

      Is there sufficient data validation and statistical analyses of data quality?

      Comments: 1) One approach to validating an assembly would be to use more than one assembly tool and compare the results. (This may or may not be within the scope of this study.) 2) With respect to the validation performed by mapping back paired end reads to the assembly, there is no discussion of the ~14% of paired end reads that did not map back in the expected orientation. Would tools like REAPR (https://www.sanger.ac.uk/science/tools/reapr) or SEQuel (https://bix.ucsd.edu/SEQuel/man.html) be appropriate to address this? (given the high level of heterozygosity in L. d. dumerilii as reported here).

      Is the validation suitable for this type of data? Yes.

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      Comments: It may also be helpful to make available the set of cleaned reads, to enable reproduction of the assembly pipeline.

    1. Now published in GigaScience doi: 10.1093/gigascience/giab045 Florian Heyl 1Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Georges-Köhler-Allee 106, 79110 GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Florian HeylFor correspondence: heylf@informatik.uni-freiburg.de backofen@informatik.uni-freiburg.de

      This work has been peer reviewed in GigaScience, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1. (Eric Van Nostrand) http://dx.doi.org/10.5524/REVIEW.102771 Reviewer 2. (Nejc Haberman) http://dx.doi.org/10.5524/REVIEW.102769<br> Reviewer 3. (William Lai) http://dx.doi.org/10.5524/REVIEW.102770

  2. Jun 2021
  3. gigabytejournal.com gigabytejournal.com
    1. CODECHECK certificate of reproducible computation

      See more on how this works in GigaBlog http://gigasciencejournal.com/blog/codecheck-certificate/

  4. May 2021
    1. Here, we report the chromosome-level genome of the venomous Mediterranean cone snail, Lautoconus ventricosus (Caenogastropoda: Conidae).
    2. comprehensive catalogue of transcripts

      See the previous GigaScience paper looking at high-throughput identification of conotoxins using multi-transcriptome sequencing https://doi.org/10.1186/s13742-016-0122-9

    1. The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years.

      Read more on these developments in the Q&A with the authors http://gigasciencejournal.com/blog/play-it-again-samtools/

  5. Apr 2021
    1. This study describes the serendipitous discovery of Rickettsia amplicons in the Barcode of Life Data System (BOLD), a sequence database specifically designed for the curation of mitochondrial DNA barcodes.

      Find out more in this GigaBlog posting on the project http://gigasciencejournal.com/blog/rickettsia-bacteria-to-rule-them-all/

  6. Mar 2021
  7. Feb 2021
    1. Additional testing of pipeline portability is currently being conducted as a part of the Global Alliance for Genomics and Health (GA4GH) workflow portability challenge

      For more on how this went and an update on where the platform has developed to in Feb 2021 can be viewed in this video from CWLcon2021 https://youtu.be/vV4mmH5eN58

  8. Jan 2021
  9. Dec 2020
    1. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.

      See the write-up by the Earlham Institute for more on how this training is going on

  10. Oct 2020
  11. Aug 2020
    1. Consortia Advancing Standards in Research Administration Information (CASRAI) contributorship taxonomy

      Which has now become NISO CRediT (Contributor Roles Taxonomy), see http://credit.niso.org/

  12. Jul 2020
    1. All supporting data and materials are available in the GigaScience database, GigaDB [48].

      Sequencing data is also in NCBI via BioProject: PRJNA576514 and proteomics data is in PRIDE: PXD018943

  13. Jun 2020
    1. The annual Parasite Awards

      See Parasite Award website here: https://researchparasite.com/

    2. SRA and Gene Expression Omnibus (GEO)

      See the NCBI short read archive (SRA) and Gene Expression Omnibus (GEO) databases here:

      https://www.ncbi.nlm.nih.gov/sra/ https://www.ncbi.nlm.nih.gov/geo/

      Other centralized data repositories are available.

    3. In 2018, eLife published a demonstration
    4. VMs

      We wrote about our first experiences publishing virtual machines back in 2014 http://gigasciencejournal.com/blog/publishing-our-first-virtual-box-of-delights-to-aid-the-fight-against-heart-disease/

    5. Hypothes.is

      GigaScience has hypothes.is integration, and you can read more about how we are using it to add value to papers in this GigaBlog posting http://gigasciencejournal.com/blog/hypothes-is-integration/

    1. teaching hands-on genome assembly courses

      Another example is the Bauhinia Genome project that has used the crowdfunded genomics data to educated the public and teach MSc students at the Chinese University of Hong Kong genome assembly http://bauhiniagenome.hk/2018/03/crowdfunded-genomes-and-the-plant-genome-big-bang/

    2. even small research groups

      There have even been community funded genome projects such as the "peoples parrot" and Azolla genome project, with strong education components such as this one (although using short reads to make an initial draft genome) http://gigasciencejournal.com/blog/community-genomes-from-the-peoples-parrot-to-crowdfernding/

  14. May 2020
  15. rvhost-alpha.rivervalleytechnologies.com rvhost-alpha.rivervalleytechnologies.com
    1. Members of the Free State Society for the Blind putting the 3D models through a trial run at a recent visit to the Museum
    2. National Museum in Bloemfontein

      See museum website here https://nasmus.co.za/

    1. accumulate potent phytotoxins

      Inspiring the film "The Birds", which you can read more in the Sanger Institute blog https://sangerinstitute.blog/2020/05/04/prising-open-the-scallop-genome/

    2. Katherine James Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK Search for other works by this author on: Oxford Academic Google Scholar Katherine James, Emma Betteridge Wellcome Sanger Institute, Cambridge CB10 1SA, UK Search for other works by this author on: Oxford Academic Google Scholar

      This Q&A features some discussion of her contribution to this project

  16. Apr 2020
    1. Wellcome Sanger 25 Genomes Project

      This project's goal is to sequence 25 novel genomes representing UK biodiversity, as part of the Wellcome Sanger Institute's wider 25th Anniversary celebrations. See https://www.sanger.ac.uk/science/collaboration/25-genomes-25-years

    1. interspecies F1 hybrid of yak (Bos grunniens, NCBI:txid30521) and cattle (Bos taurus, NCBI:txid9913)

      In Tibet this type of yak-cow hybrid is know as a "dzo" (མཛོ)་ https://en.wikipedia.org/wiki/Dzo

    2. Timothy P L Smith US Meat Animal Research Center, US Department of Agriculture, State Spur 18D, Clay Center, NE 68933, USA Correspondence address. Timothy P. L. Smith, US Meat Animal Research Center, US Department of Agriculture, Clay Center, NE 68933, USA. E-mail: tim.smith2@usda.gov   http://orcid.org/0000-0003-1611-6828 Search for other works by this author on: Oxford Academic Google Scholar Timothy P L Smith

      See the Q&A with Benjamin Rosen and Timothy Smith in GigaBlog for more insight http://gigasciencejournal.com/blog/dna-day-2020-cattle-reference-genome/

    1. progression of respiratory diseases

      Including COVID-19, as it is estimated 50% of patients with COVID-19 who have died had secondary bacterial infections. Watch the COSMIC project looking at metagenomics of respiratory samples to identify the bacteria, fungi, and viral co-infections present in patients with COVID-19 https://www.covid-coinfections.org/t/cosmic-co-infections-and-secondary-microbial-infections-in-covid-19/17

    1. Supplemental File 1: Extended Chinese language (中文版) version on the editorial.

      See also a Chinese language adaptation of this statement in Bull. Ntnl. Nat. Sci Foundation China. http://www.cnki.net/kcms/doi/10.16262/j.cnki.1000-8217.2018.06.001.html

    2. A version of the editorial translated into Chinese is included as a Supplementary File

      See also a Chinese language adaptation of this statement in Bull. Ntnl. Nat. Sci Foundation China. http://www.cnki.net/kcms/doi/10.16262/j.cnki.1000-8217.2018.06.001.html

    3. Here, we help clarify this and also provide a clear statement of our expectations around how authors are assigned to manuscripts submitted to GigaScience.

      A more detailed version of this clarification and background is available via our blog: http://gigasciencejournal.com/blog/appropriate-authorship/

    4. Laurie Goodman

      ‡ Senior author

  17. Mar 2020
    1. which is the basis of our planned second release (PLINK 2.0).

      See the homepage for updates taking it towards PLINO 2.0 alpha https://www.cog-genomics.org/plink/2.0/

      We also have phased and annotated data for use in plink2.0 worked examples in GigaDB http://dx.doi.org/10.5524/100516

    1. 3.Li Z, Barker MS. Inferring putative ancient whole genome duplications in the 1000 Plants (1KP) initiative: Access to gene family phylogenies and age distributions. bioRxiv. 2019:735076. https://www.biorxiv.org/content/10.1101/735076v1.

      A peer reviewed and updated version of this has now been published in GigaScience https://doi.org/10.1093/gigascience/giaa004

    2. chlorophyte green algae

      Some of the authors have recorded a podcast discussing the implications for algae research from this data https://podcasts.apple.com/us/podcast/gane-ka-shu-wong-michael-melkonian-on-if-algae-can/id1420197433?i=1000458893924

    1. hypothes.is (use the hashtag/tag #chromosomenomenclature)

      Please add comments directly on the key parts of the commentary you would like to raise any issues with.

    1. Table S3. Representative applications of genome editing. A summary of the representative applications in different organisms.

      Using hypothes.is this information can also be updated via annotations here. e.g. adding mention of Twist biosciences, whose Oligo pools are utilized in many CRISPR applications including generation of CRISPR guide RNA (sgRNA) libraries. See https://www.twistbioscience.com/products/oligopools

    2. Table S1. Online tools for TALEN and CRISPR/Cas9. Collected online tools for TALEN and CRISPR/Cas9 are presented in this table. Updates can be accessed in GitHub [107]. Table S2. Commercial service for TALEN and CRISPR/Cas9. Collected commercial service for TALEN and CRISPR/Cas9 are presented in this table. Updates could can accessed in GitHub [107]. Table S3. Representative applications of genome editing. A summary of the representative applications in different organisms.

      Given that new methods, kits, and services continue to be rapidly developed and updated, an editable version we set up on Github wiki, and readers encourage to update it. See https://github.com/gigascience/paper-chen2014/wiki

    1. This must be achieved by sequencing and archiving huge numbers of microbial genomes, both from clinical cases and known environmental reservoirs, on a continual basis.

      Even without reference genomes, mining metagenomes for coronavirus sequences has become particularly topical in 2020. See the Pangolin 2019-nCoV-like coronavirus example https://doi.org/10.1101/2020.02.08.939660

    2. swine flu

      Jennifer Gardy discusses the groundbreaking H1N1 crowdsourcing efforts in her TEDx talk here (with lots of lessons for the coronavirus outbreak a decade later) https://www.youtube.com/watch?v=LmAugMSJ1-Y

    3. Escherichia coli O104: H4

      See more in GigaBlog about the novel "tweenome" method of datasharing for this project http://gigasciencejournal.com/blog/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/

    1. MERS coronavirus

      Mining metagenomes for coronavirus sequences has become particularly topical in 2020 (see the Pangolin 2019-nCoV-like coronavirus example https://doi.org/10.1101/2020.02.08.939660)

    2. RNA viruses

      As this works with RNA viruses it has been made part of the "Free access to OUP resources on coronavirus and related topics" collection on the Oxford University Press website https://academic.oup.com/journals/pages/coronavirus

    1. direct RNA sequencing. Despite the scientific relevance of VACV, no LRS data have been generated for the viral transcriptome to date.

      This approach of using Oxford Nanopore direct-RNA sequencing for viruses has now been carried out on the SARSCov2/COVID19 causing coronavirus. See https://doi.org/10.1101/2020.03.05.976167

  18. Feb 2020
  19. Oct 2019
    1. African eggplant

      Also know as the scarlet eggplant or bitter tomato.

    2. “orphan crop”

      The African eggplant is a good example of the work of the Africa Orphan Crop consortium and many of the authors are consortium members. You can read more on the first genomes released in GigaBlog here: http://gigasciencejournal.com/blog/democratising-data-aocc/