On 2013 Nov 11, James Hadfield commented:
This paper describes the DESeq method, for differential RNA-Seq, ChIP-Seq and other analyses. DEseq was developed to work on low replicate numbers and indeed many people cannot generate high numbers of replicates. But I would challenge the community to consider that the costs of NGS have dropped very significantly since these methods were conceived and that increasing replicate numbers to higher levels is now inexcusable in many scenarios.
Both of the papers referred to in the comments so far reference multiple RNA-seq, and/or other, datasets that were used to test the methods from which their conclusions are drawn. Wolfgang Huber mentions the constraints of samples-size in his comments and also has a section on working without replicates in the Anders/Huber paper above, in it they discuss the impact that within and between group sample variability have on the results.
Some very real difficulties in appraising which approach (DESeq2 or SamSeq) is best include the limited amount of time the community has been testing the different approaches, that the approaches themselves are still very much in development, and that very different datasets are used in each study.
This last issue is made more of a problem since the experimental methods section in many NGS papers is generally not clear enough. It would help to have clear guidelines on the number samples used and their relationship e.g. biological or technical replicates, and if technical at which stage is replication being made; the number and type of reads generated at a per sample and per group level would also be useful. Getting this information can be painful as evidenced by digging through the DESeq2 and SamSeq references:
DESeq2
Wilczynski: very difficult to determine from the data provided or online.
Engstrom: mRNATag-seq, 5 samples (3 & 2 replicates per group), no indication of reads per sample.
Nagalakshmi: mRNA-seq, 4 samples (2 technical & 2 biological replicates per group), 7M reads per sample (possibly).
Kasowski: ChIP-seq, 10 biological samples, 2 groups, 660M reads 33M reads per sample.
SamSeq
Hoen: mRNA-seq, 4 replicates each of around 2.5M reads.
Marioni: mRNA-seq, 2 groups (liver & kidney), 7 technical replicates (at the lane level), 85M reads per group, 12M reads per replicate.
Witten: miRNA RNA-seq, 29 biological samples per group (Tumour vs normal), average 0.75M reads per sample.
Perhaps a simple format could be agreed on by the community as a table to be added in to each publication as a supplemental?
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.