10,000 Matching Annotations
  1. Last 7 days
    1. On 2019-08-22 15:16:23, user William James wrote:

      Very nice follow-up work by Cantor and Lenardo on T cell biology in physiological medium. NB, HPLM as used here is not identical with the 2017 original, as it has been supplemented with Uridine (3 µM), α-ketoglutarate (5 µM), acetylcarnitine (5 µM) and malate (5 µM). This is no bad thing in principle, as they are physiologically and biochemically justified, but I wonder whether we should start to version-number media formulations, to avoid confusion. This could be, for example, HPLM v1.2

    1. On 2019-08-02 17:36:17, user Kathleen wrote:

      WOW! Differential Expansion Microscopy-Machine Learning (DiExM). Nice work!! Anisotropic expansion of up to 8-fold linear and >500-fold volumetric. Important study utilizing expansion microscopy (ExM) for precise nano scale imaging of cellular structures. Opinion on ExM posted by Francis Collins https://directorsblog.nih.g.... DiExM will greatly progress nanoscale imaging and greatly progress diagnostic pathology.

    2. On 2019-07-30 15:40:13, user Ranya wrote:

      WOW! Differential Expansion Microscopy-Machine Learning (DiExM). Nice work demonstrating anisotropic expansion of up to 8-fold linear expansion. Important study utilizing expansion microscopy (ExM) for precise nano scale imaging of cellular structures. DiExM will greatly progress diagnostic pathology. The scope of ExM is highlighted by NIH Director Francis Collins in his recent blog https://directorsblog.nih.g...

    3. On 2019-07-26 05:28:49, user Jazlin wrote:

      To learn complex scientific applications as the bioRxiv Cell Biology illustrates, we need to lay the foundation in scientific and engineering practices at the school level. All students must have opportunities to excel in STEM higher learning and STEM careers. Thus teaching teachers to help their students shift their scientific reasoning from the visible (macroscopic) to invisible (submicroscopic) has been my passion for teaching and research.

    1. On 2019-08-22 09:06:00, user WJR wrote:

      Brian (or Donal Hickey),

      Thank you for your thoughtful response. And thank you for pointing out the "break" statement at line 246, which my eye had misinterpreted. [I had mentally processed the enclosed "if" statement as though it were a "switch" statement, thereby misinterpreting the "break" statement, (which has a different behavior in these two contexts). That was my mistake.]

      That solves the problem. I agree your simulation indeed behaves as you describe and is not overly inefficient.

      I do hope your simulation can be developed into a convenient research tool (with various parameters that can easily be set by the user at run time).

    2. On 2019-08-21 18:59:50, user Donal Hickey wrote:

      Yes the comment by WJR correctly surmises that the structure 'int' named<br /> .sex is a legacy variable. But no, while the code is mildly inefficient<br /> it is not grossly inefficient and nor is it incorrect. Nor is there a<br /> great deal of over-writing being done. In fact the code is multipurpose<br /> and for the uses in this paper, there is no need for the .sex flag.<br /> Nor is the quicksort subroutine ever called.

      Because the code is multipurpose, the remnant of the .sex flag can<br /> be confusing. The comment by WJR that lines 233 to 252 are slightly<br /> inefficient is correct but the comment that they are very inefficient<br /> is wrong. In this section of the code, each individual gets to produce<br /> FECUNDITY offspring. In this case FECUNDITY is 2. The individual's two<br /> offspring can be from two matings with different individuals or twice<br /> with another single individual (but not with self mating).

      The break statement at line 246 ensures that these matings occur<br /> exactly FECUNDITY times. As soon as the mating is accomplished the<br /> break statement causes the program to stop and then continue with the<br /> next mating. The progeny from each mating (of parent n[i] is stored in a<br /> new structure (m[.]) so that there is no overwriting done. It is common<br /> practice to create a second temporary structure/variable to momentarily<br /> store results.

      So, no there is no overwriting done except when the temporary<br /> structure/variable is copied back to the original. Every individual<br /> will mate at least FECUNDITY times.

      It is true that the three "if" statements that query the .sex variable<br /> are inefficient. It would be possible to rewrite the code so that it no<br /> longer has remnants of it's multi-functionality and this would slightly<br /> increase speed. But the program is already very fast and there seems no<br /> reason at all to increase the speed of a program that only takes minutes<br /> to run and by doing so would reduce the functionality of the code.

      Also I would guess that deleting the -g debug flag would increase speed<br /> a great deal more. But again there is no need.

      Brian

      Uncomment line 61 so that idum has a fixed value and you will<br /> reproducibly get the results below.

      Recompile

      run with breaks at these lines ...

      line 230: m[0]=n[0]<br /> line 231: k=88604<br /> line 244: m[0].maternal=n[88604].maternal<br /> line 246: break -> line 229


      line 230: m[1]=n[0]<br /> line 231: k=66537<br /> line 240: m[1].paternal=n[66537].maternal<br /> line 246: break -> line 229


      line 230: m[2]=n[1]<br /> line 231: k=55470<br /> line 240: m[2].paternal=n[55470].maternal<br /> line 246: break -> line 229


      line 230: m[3]=n[1]<br /> line 231: k=11152<br /> line 238: m[3].paternal=n[11152].paternal<br /> line 246: break -> line 229


      line 230: m[4]=n[2]

    3. On 2019-08-21 11:31:21, user WJR wrote:

      My goal here is that G.B. Golding's simulation ought be fashioned into a widely useable research tool. To do that, it needs to be: (1) computationally efficient (in terms of computer time and computer memory), and (2) widely understandable. The simulation can be improved along both those lines.

      Toward that goal, I offer the following comments:

      1) As noted in my previous post, the 'reproduction' portion of the simulation is exceedingly inefficient. (And awkward to understand.) In effect, the simulation produces a specific number of progeny (= FECUNDITY times number_of_parents), by producing EACH individual progeny through a random mating among the parents. If that is the goal, then this code (as written) is both inefficient and awkward to understand. Also, compared to my direct wording above, the paper's description is needlessly cryptic, "The number of offspring produced by each individual was determined by sampling a Poisson distribution with a mean of 2. All parents were diploid and had the same average fertility." (Further note about this simulation: For matings, a parent can switch back-and-forth between male and female. The software merely picks two random individuals and 'mates' them. This is not necessarily fatal to the simulation's validity, but it ought be clarified in the text.)

      2) The following items appear in the code, but are never used: RUNS, ITERATES, RECOMB_RATE, l, iter, poindev(), gammln(), and quicksort(). These can be eliminated, with no loss.

      3) The structure variable, '.sex', is not needed; is confusing; and increases the inefficiency of the simulation. The sexual and asexual versions of this simulation are already different, so nothing is gained by having this (inefficient) flag that indicates whether a specific individual is sexual versus asexual. They are ALL sexual or ALL asexual, and this is already accomplished by the two different versions of software.

      4) The parameter, MAX_SIZE, sets the size of various arrays large enough to accommodate the population size. It is currently set to 1 million, (for one-million individual progeny). However, this is nine- to ten-times larger than it needs to be, (as the size of the progeny population is only 100,000). This needlessly wastes computer memory space.

    4. On 2019-08-16 08:02:18, user WJR wrote:

      Bug Report:

      This (preprint) paper by Hickey and Golding relies on a software simulation. (Available at: https://github.com/gbgolding/evolutionSex)

      There is a bug in the currently posted version of that software. It occurs in the file "sexual7.c" (seen August 16, 2019). The bug strongly decreases the computational efficiency of the simulation, and slightly affects the results.

      In that file, there is a structure integer variable named '.sex', which is initialized to zero, and tested several times by IF-statements, but it is never changed. In other words, it does nothing. That is the first clue something is awry.

      (Note: For some reason, when I try to post the offending code here, it gets displayed as an unreadable mess. I don't know why. I tried placing the appropriate code formatting brackets around it. So, instead I must here describe where the bug occurs.)

      Lines 233 through 252 comprise a While-loop, which creates each progeny by a mating of some parent_i with some other parent. However, the loop then proceeds to OVERWRITE that progeny's allele data by a mating with EVERY OTHER parent. The original parent_i's allele data gets obliterated within that progeny, because it gets over-written many times. Each and every progeny is computed by mating each parent with EVERY OTHER parent. But only the last matings count, as the previous matings get over-written. This is exceedingly inefficient computationally. If the population size is n, then the computer time increases with n-squared, rather than n.

      The end result is that each progeny is indeed the result of a random pairing of parents, but not the ones the simulation-writers intended. Moreover, the simulation aims that each parent be involved in AT LEAST a minimum number of reproductive events (given in the code by "FECUNDITY"), but that goal is not achieved. Due to randomness, some parents can mate numerous times, while others don't mate at all -- contrary to the stated design of the simulation. There is a disparity between what the code intends to do, and what it actually does, and this can affect the results.

      I'm guessing the variable .sex is a vestigial remnant of code (now largely absent) that had originally matched each parent with exactly one other parent (i.e., for an obligate monogamy model). I imagine .sex was originally a flag, used to indicate that a given individual has (or has not) been used yet as a parent. This approach would be one of the simplest ways of guaranteeing that each parent has the fecundity claimed in the paper.

    1. On 2019-07-09 23:06:32, user Rob W wrote:

      Fantastic work! This is a great step forward and will help others to better research the full lifecycle of Toxoplasma - inspiring stuff!

    1. On 2019-07-26 13:04:56, user Alex Alexandrov wrote:

      Great paper, hi from Alex Alexandrov in Moscow, Russia!

      Question (I might have missed the answer in the paper) - How large a percentage of age-related cell death (division arrest) in WT and mutant strains can be accounted for by terminal missegregation events?

    1. On 2019-07-16 15:38:02, user Robert Gourdie wrote:

      The manuscript was officially accepted for publication in revised form in the Journal of the American Heart Association on July 16, 2019 under the revised title "Interaction of aCT1 Peptide with the Connexin43 Carboxyl Terminus Preserves Left Ventricular Function Following Ischemia-Reperfusion Injury"

    1. On 2019-08-20 09:36:30, user J wrote:

      In Antibody docking, ClusPro masks residues falling outside the hypervariable regions (a.k.a. CDR loops), meaning that sampling is much more restricted compared to regular blind or ab-initio docking. It might not be a fair comparison to your method though

    1. On 2019-08-20 03:32:22, user Nicolas Franco Takehisa Torres wrote:

      Hello, I'm working with mtb and I'm interested in using the primers that you have developed (mamK_79F-mamK_577R and mamK_86F-mamK_521R). Where can I find the supplementary material? Thanks.

    1. On 2019-08-11 00:57:45, user Charles Warden wrote:

      Since there has been a great discussion about this paper on Twitter, I thought I should provide a link, as well as a summary of my responses.

      https://twitter.com/carolin...

      Given that my responses were somewhat long (and I thought it might be useful to have extra control of formatting for images), I summarized my thoughts in the following blog post:

      http://cdwscience.blogspot....

      While I think the pictures are qualitatively the same, I did notice a bug that was causing me to overlook some sites. Editing the blog post is also relatively easier, and I keep a change log (under the assumption that I'm not going to get everything absolutely right the first time, and the informal context of the blog post is meant to convey some amount of room for future improvement).

      So, I apologize about the bug, but I hope blog posts (either before pre-prints, or to help gain attention for pre-print comments) can help with the process of improving papers before submission in a more formal peer review system. Plus, more importantly, I hope readers (and authors) find this specific post/ response to be interesting and useful.

    2. On 2019-07-22 12:09:04, user Debbie Kennett wrote:

      I would be interested to see if these findings can be replicated on Illumina chips. UK BioBank uses an Affymetrix chip yet all the consumer genetic testing companies use Illumina chips. (Living DNA recently converted to an Affymetrix chip but they've only tested a small number of people.) Over 30 million people have now taken consumer genetic tests on Illumina chips. I would have thought that this large-scale testing is likely to improve the cluster analysis. It might therefore be misleading to extrapolate from these results to consumer genetic tests.

      23andMe's submission to the Science and Technology Committee's enquiry into consumer genomics is interesting. They claim 99% concordance with Sanger sequencing.

      http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/science-and-technology-committee/commercial-genomics/written/101018.html

      23andMe have sold 192,000 kits in the UK. There is an opportunity for UK BioBank researchers to work with 23andMe to check the validity of 23andMe results. I suspect many UK Biobank participants have also tested at 23andMe.

      Interestingly, MyHeritage have recently introduced health reports for their DTC genetic test. They have stated that they will double-check all pathogenic findings with Sanger sequencing:

      https://blog.myheritage.com/2019/05/introducing-the-myheritage-dna-health-ancestry-test/

    3. On 2019-07-21 17:45:36, user Mike Boursnell wrote:

      Very interesting results. Can I ask, in Figure 1b, with the bad separation between homozygous and heterozygous variants... is that all due to poor clustering, or is it also a poor assay due maybe to choice of oligonucleotide?

    4. On 2019-07-09 23:57:59, user Charles Warden wrote:

      Thank you for posting this pre-print!

      Perhaps I need to sit down and try to take some time to read this (other) paper more carefully, but one thing that I thought seemed concerning was the use of imputation for rare variants (if I understood that correctly, with briefly skimming the paper): https://www.biorxiv.org/con...

      However, I think you are making a separate point about cluster generation (so, you directly measure the variant, but something that was called probably should have had a "no call" status), rather than issues with imputation in rare variants.

      I know that I was correctly identified as a cystic fibrosis carrier on the 23andMe SNP chip and incorrectly identified as not being a carrier with high-throughput sequencing from multiple companies (with the automated annotation --> I could look at the alignment .bam file and .vcf files to confirm that I was in fact a carrier with multiple technologies): https://github.com/cwarden4...

      So, for that reason, I don't want to down-play the overall use of SNP chips for carrier status diseases (or the possibility of needing re-analysis of raw data with high-throughput sequencing data), and I wonder if perhaps the sentence "extremely unreliable for genotyping very rare pathogenic variants" should be re-worded.

      Without the cluster generation files, use of raw .idat (or other format) raw data may be difficult if given access. However, could you separate probes that worked relatively better with other features (such as overall intensity and/or a more stringent call rate threshold)?

      Also, it looks like you are looking at Affymetrix and ThermoFisher arrays. What if you look at Illumina arrays (like used by 23andMe)? I guess 1000 Genomes data isn't really emphasized for rare variants. However, if you can get meaningful information about individuals like myself (with both array genotypes and high-throughput sequencing raw data), perhaps other resources like the Personal Genome Project can help?

      https://my.pgp-hms.org/prof...

      I don't know about rare variant coverage, but there are also some Illumina SNP chips in GEO (and dbGaP)? Since people making their 23andMe genotypes freely available on the Personal Genome Project won't give you access to raw data for re-processing, perhaps this would help with the question of systematically identifying the probes that you may want to filter?

    1. On 2019-08-19 14:32:19, user Soreng wrote:

      Most of the Stipeae data and discussion were already covered by <br /> Romaschenko, K., N. Garcia-Jacas, P. M. Peterson, R. J. Soreng, R. Vilatersana & A. Susanna. 2014. Miocene–Pliocene speciation, introgression, and migration of Patis and Ptilagrostis (Poaceae: Stipeae). Molec. Phylogen. Evol. 70: 244–259. <br /> But this is not cited!!!!!!!!!!!!!

    1. On 2019-08-17 19:10:13, user Hurley Li wrote:

      Data leakage problem in your model!!!

      The design of your adjacency matrix and the way you split the train/test set will cause a huge data leakage problem in your training, because your train / test set is created independently for gene_adj and gene_adj.transpose(copy=True), and therefore the edges from the test set in gene_adj is actually included in the training set of gene_adj.transpose(copy=True).

      Same problem goes for the train / test set between gene-disease matrix and disease-gene matrix. The validation edges from gene-disease matrix are actually used for training in disease-gene matrix, and vise versa.

      Could you please clarify?<br /> Thanks!

    1. On 2019-08-17 18:29:04, user Rudy Mikšánek wrote:

      Interesting work, and a great paper! I have two brief questions/comments:

      (1) If Figure 2 presents the life cycle of the nematodes (Lines 277-278), why are there arrows after E-phase if, in both male and female figs, any nematodes "left behind" at that point die?

      (2) You might consider using a line to connect nematode densities in Figures 3 and 4 as you're reporting time-series data (the bar graph emphases extremely discrete fig developmental phases). I also don't believe any statistical analyses are needed, but treating these data as continuous seems more true-to-life.

      Thank you for sharing on bioRxiv!

    1. On 2019-08-16 18:03:29, user Rudy Mikšánek wrote:

      Nice work! I like the historical background about the Sambhar Lake area. Might I suggest also including a Maxent SDM for the flamingo in 1996 and comparing that to 2019. Also, could you clarify *exactly* which variables were used for the Maxent model (did you use all the spectral indices mentioned? which bioclimatic variables from WorldClim were chosen?). Finally, some of your acronyms are not defined (ex. GBIF, AWC, NHBF), but it may be best to just spell out acronyms every time (e.g. "surface algal bloom index" instead of "SABI") since they aren't used very much anyway.

    1. On 2019-08-15 20:00:46, user Andrey Shubin wrote:

      Hi,<br /> Thanks for this preprint! It looks like supplemental tables are not available now. Could you please upload them too?

    1. On 2019-08-15 15:06:42, user Rob King wrote:

      I can see the fastq files on NCBI, do you have the bams submitted to NCBI for download or somewhere else I could download please? or are they in the ncbi SRR file? I'm looking to reproduce what you have done and need the bam raw for that..

    1. On 2019-08-15 01:31:04, user Guijun Wan wrote:

      Besides enviromental factors such as photoperiod and temperature, the geomagnetic field intensity may be a new cue for the regulation of insect migration.

    2. On 2019-08-15 00:41:53, user GW wrote:

      We updated a new version with corrected typein errors, replotted figure 2 and corrected sampling time. The revision will online soon.

    1. On 2019-08-15 00:39:00, user Joel Rothman wrote:

      This is an exciting demonstration of in vivo reprogramming of muscle into endoderm in zebrafish by direct lineage conversion and across germ layer types. It adds substantial support to the view that differentiated cells can be transdifferentiated into endoderm (e.g., as has been observed in worms).

    1. On 2019-08-13 19:33:30, user Doniv Hgnis wrote:

      Dear author, you mentioned that "Consistent with a previous report (Alexandrov et al. 2018), we find tobacco smoking (signature6) ", but if you look at the cosmic signature database, signature4 belongs to tobacco smoking.

      1. in the statement "which presents high exposure for signature 2 (UV light) ", and signature 2 is associated with APOBEC reactions of DNA.

      Please relook the manuscript once more

    2. On 2019-08-01 00:47:03, user Doniv Hgnis wrote:

      Signature 7 belongs to UV radiation not the signature 8 that is mentioned in the article. So, if I am true then you will need to redo some of the analysis.

    1. On 2019-08-13 16:38:12, user Alex Crits-Christoph wrote:

      1. In the preprint's current state, reproducing the analysis shown is impossible due to the short length of the methods section and a near complete lack of description of the metagenome assembled genomes used and their sources. I highly suggest that the authors provide at minimum a list of genomes with accession numbers used in this study, and in better faith the alignments and trees produced as part of the analysis.

      2. It is clear from a casual glance that a majority of the KS tests in figures S3-S5 (which test to see if there is a difference in phylogenetic depth between MAG and non-MAG sets) would be significant, directly contradicting the authors' claims and confounding the analysis in a way that would explain the effect observed. Why have the authors gone to great lengths to report the exceedingly low p-values for the comparisons in the main text, but have omitted the p-value comparisons for these figures?

      3. As has been pointed out by numerous community members, the authors make key conceptual errors concerning how metagenomic binning could cause the effect they observe. There are dozens of published near complete MAGs (especially from the CPR) which are assembled in one complete contig, and thus are never binned - binning errors could not explain phylogenetic anomalies in these cases. Additionally, the synteny of the ribosomal block means that even for incomplete MAGs, many of the genes analyzed in this study would be on the same contigs and thus not be affected by metagenomic binning.

      4. The authors neglect to address the previous publication of single cell amplified genomes for some of the taxa they analyze (e.g., CPR), and previous comparisons which have shown good congruence between metagenome assembled genomes and single cell genomes:<br /> https://microbiomejournal.b...

      While it is currently impossible to reproduce the preprint's analysis and deconstruct how much of the issue described in (2) is responsible for the effect observed, the authors' conclusions make conceptual errors that invalidate the strong form of their claims regardless. I hope that the authors will be able to provide the data they use so it will become possible for the community to properly evaluate the analysis in this work.

    2. On 2019-08-11 10:10:22, user Tanai Cardona Londoño wrote:

      I would like to know what drove the authors to do this analysis. Were you suspecting that CPR and Asgard clades were artificial? Why?

      In any case, I do not find these results surpising at all. I am not a metagenomics or bioinformatics specialist, but I do use some of the basic tools with some regularity and always find obvious errors that do not make sense even if hotizontal gene transfer is invoked. For example, a chloroplast photosystem gene from an angiosperm in the genome of a well-known non-photosynthetic bacterium... In fact, even genomes from supposedly axenic cultures bring errors of contamination and assembly, as I have found out rather too often through personal experience. It only makes sense, in my humble opinion, that the problem would be exacerbated in metagenomes.

      However, I suppose that most of these types of assembly errors are usually explained away in metagenome projects as horizontal gene transfer events.

      When I saw analyses of the distribution of certain core bioenergetic proteins (e.g. respiratory complexes) in "new clades" discovered through metagenome projects, the amount of HGT usually claimed seemed a little too high compared to what was understood from phylogenies before the metagenome era. I did raise an eyebrow a couple of times, as more often than not, these reconstructions are not followed by a critical consideration of the possibilities of errors. Nevertheless, I did think that overall a MAG would be, to a substantial extent, made up of sequences from the organism it was claimed to be.

      I did not imagine that the problem could be so bad though. I had the feeling that some of these new organsims from these new clades had been independently validated through single-cell genomics and by other means.

      So, what's going to happen with all of these new clades and phyla that have been created recently... how many of these are real? How many of these are artificial?

      I would appreciate if the authors could "deconstruct" one of these MAGs and show, at least at the ribosomal level, where the different subunits come from, and whether there is a true signal of novel clades within some of the sequences, or if it is all entirely artificial.

      Thanks.<br /> Tanai

    1. On 2019-08-12 18:27:07, user Manuel Razo wrote:

      As the first author of the paper I have to inform the community that we found a small numerical error in our calculations. We are currently working on solving the issue.<br /> For the most part the conclusions of the paper remain unaltered. It is only a matter of the numerical precision of our predictions that were modified by this. We will upload the correct results ASAP.<br /> - Manuel Razo-Mejia, (2019/08/12)

    1. On 2019-08-12 12:39:42, user Tim Fenton wrote:

      Interesting stuff, and nice to see an APOBEC3B antibody working well on Western blot to detect the endogenous protein with KO cells acting as controls. Is chromothripsis ever seen in patients who are homozygous for the A3A_B allele? A3 responsible could vary between cell lines and tissue types - something to look for in the cancer sequencing data.

    1. On 2019-08-10 22:42:47, user Stefan Mordalski wrote:

      Three major findings from the paper:

      1. Synthetic peptide with the sequence of 12 C-terminal residues of Gai (G-peptide) acts as an allosteric modulator of the mOR – stabilizes the active conformation of the receptor:

      https://uploads.disquscdn.c...

      1. With the G-peptide it is possible to distinguish biased ligands in a “simple” radioligand binding assay

      https://uploads.disquscdn.c...

      1. Unbiased MD simulations show the early stage of ternary complex formation (we believe to have observed “low affinity complex” described in https://t.co/gJBpDhze0B), as predicted by http://t.co/sd44w5qp
    1. On 2019-08-09 22:58:37, user microbial_minded wrote:

      I would like to thank the authors for uploading this interesting manuscript. I am hoping the authors can provide some additional comments and clarification. In figure 1, Ralstonia is indicated present in the basal plate from collected placentae based on fluorescence signal from a Bacterial-specific probe (panels C and H) and a Ralstonia-specific probe (panels D and I). DAPI counterstaining is shown in panels E and J. Arrows denote the location of Ralstonia in panels C, D, H, and I.

      However, there are several punctate foci (bottom right of panels – two in C/D and one in H/I) that are not denoted as being Ralstonia in these same panels despite having signal in both channels. These same unlabeled points are also present in composite image panels G and L. There are other points that appear in the panels with the bacterial-specific probe and Ralstonia-specific probe, but those points do not appear to overlap, unlike the above mentioned three points and the arrow-labeled points. Of further note, DAPI staining of total nucleic acid does not appear where the Ralstonia cells are denoted, even in cases where the Ralstonia cells do not overlap with the DAPI-stained host cells. This is especially pronounced in panel L, with the bottom point being ~5um away from the nearest stained nuclei. This is a curious observation given that DAPI has been used to stain Ralstonia in the past (https://doi.org/10.1128/AEM... and http://dx.doi.org/10.1016/j...:gBnjca_6fbXvTz--s20UUMIAztU "http://dx.doi.org/10.1016/j.colsurfb.2012.09.044)") and the authors clearly demonstrate the ability of DAPI to stain Ralstonia in culture in supplemental figures S1 and S2.

      I have three questions related to these data:

      1a. Do the authors consider those unlabeled points in Fig. 1 to be Ralstonia? <br /> 1b. If not, what differentiates the unlabeled points from the labeled points?

      1. Why is there lack of DAPI staining where the Ralstonia cells are in Fig. 1, as DAPI should also stain bacterial DNA?

      Thank you in advance for your time and response.

    1. On 2019-08-09 18:56:43, user Casey Greene wrote:

      The COI statement in this manuscript is painfully flawed. The sole author of this manuscript appears to be the founder of a company that claims this as its first preprint.

    1. On 2019-08-09 16:52:43, user Charles Warden wrote:

      Thank you very much for posting this - I believe this is something that needs to be emphasized more (although you importantly emphasize that some regions have higher confidence).

      For somatic variants, I believe Figure 5 in this paper also shows lower concordance among the lower frequency variants: https://www.biorxiv.org/con...

      For germline variants, I also observed a similar (overall) phenomenon (in terms of the relationship to allele fraction). While there are multiple figures to compare in the original Warden et al. 2014 paper, I summarized the parts for this point in a blog post:

      http://cdwscience.blogspot....

      I guess it would be best if I could refer to something by different author. However, with this comment, it should be easier for me to find this paper (to encourage future citations).

    1. On 2019-08-09 14:04:30, user Rob King wrote:

      I can't find a BAM of raw pacbio which is needed to repeat the falcon step, you can submit bams pacbio to ENA, I assume can to ncbi. Knowone seems to as extra work, but is it available?

    1. On 2019-08-09 02:22:36, user Elan Moritz wrote:

      This is fascinating. I was always curious why the well educated / prosperous individuals don't seem to make the >110 lists. I was just looking at the Gerontology Research Group (GRG) site that lists what they believe are the oldest of the old by name and place [<br /> http://www.grg.org/SC/World... ], that group seemed legitimate, and the Supercentenarians fit the economically challenged / remote area descriptions of this preprint. Now, there those who argue calorie restriction and some stress are beneficial ... but this is all very curious. Curious enough for me to dive more deeply into the longevity and aging science pool. I guess in 20-30 years we'll have much better documented data.

    1. On 2019-08-07 11:17:55, user Josip Skejo wrote:

      Nice one. Please do correct name of the new Loki in the tree. In the text you call the genus Prometheoarchaeum, while on the tree it is Izanagiarchaeum.

    1. On 2019-08-08 13:50:03, user Jiarui Ding wrote:

      It's a pity that our contribution is totally ignored:<br /> Interpretable dimensionality reduction of single cell transcriptome data with deep generative models

    1. On 2019-08-08 10:19:46, user Lauri Aaltonen wrote:

      Great work, congrats, we have though previously published families with a Mendelian defect in DNA demethylation so perhaps not emphasize the priority so prominently; Impact of constitutional TET2 haploinsufficiency on molecular and clinical phenotype in humans. Kaasinen et al. Nature Communications 2019 Mar 19;10(1):1252. doi: 10.1038/s41467-019-09198-7. Best wishes & good luck with your submission! Lauri Aaltonen

    1. On 2019-08-08 09:06:22, user Rosalind Arden wrote:

      Keen to read this interesting study. It would make easier reading if the acronyms were cut out. They impose a cognitive load on everyone but the Authors (who are 'cursed with knowledge'!)

    1. On 2019-08-07 23:48:41, user Arinjay Banerjee, Ph.D. wrote:

      Nicely done. Are the results from human tumors flipped in the results section? As is, it says that higher levels of SIDT2 led to better prognosis. Shouldn’t it be the opposite, as discussed in the discussion? What am I missing? Nice study though.

    1. On 2019-08-07 23:22:57, user Laura Sanchez wrote:

      The manuscript by Marchione et al. describes a novel and exciting method to perform proteomics on archived FFPE tissue. The manuscript is thorough and did not over or understate the value of the findings. The method appears to have broad implications for the use with archival FFPE samples and importantly, does not require great measures or lengthy extra steps in order to achieve proteomic quality achieved with fresh frozen tissues. It was noted that a broad extension could be to incorporate this workflow with FFPE tissue blocks that are being used for imaging mass spectrometry workflows. Coupling the spatial information with the IDs from the HYPERsol workflow would be incredibly powerful for clinical applications. We appreciated the use of color coding and acronyms to attempt to simplify the readability of the manuscript, however we do have suggestions which may improve this further. The supplemental figures were well done and the overall consensus was that some of them may be better suited as main figures although we realize this may be a journal specific limitation on the number of figures that could be included. A full list of major and minor critiques is listed below with hopes that this may help the authors improve readability and strengthen the findings.

      Major:<br /> The reference to “flash-frozen results” in the title is not abundantly clear to refer to the FFPE results as having comparable quality to flash-frozen tissue results. Rewording the title would help with clarity.

      Figure 1e is missing a figure legend.

      Assumed level of knowledge with how tissue samples are usually handled for proteomic experiments is very high, but we are unsure of who the target audience is. Additional references or explanation could help broaden the audience.

      In paragraph “in order to compare” on page 2, it does not comment on limitations of HYPER-Sol. It would be helpful to know what protein categories (if any) are missing in HYPER-sol as represented in figure 2f, because a large list of Protein ID’s is difficult to dig through and it is not abundantly convincing that the missing proteins are simply noise.<br /> Acronyms are not consistently used throughout the paper. Mentions of XPM could also be widely replaced as “standard”, and DAS with HYPER-sol. Figure 1b could also benefit from having some sort of legend for the conditions or having them appear directly in the table similarly to how they are depicted in the text on page 1.

      We were very interested in the claim that TLE1 expression was only 3-fold more expressed than MPNST. Supplementing the mass spectrometry experiments with immunohistochemistry would be a strong orthogonal validation that the MS method is indeed robust.

      The limitations of extending HYPER-sol to historical samples should be further articulated. For instance, even though all of these historical tissues exist, researchers may run into issues finding the metadata due to a lack of historical medical records for archived tissue, thereby limiting the usefulness without the biological context. Perhaps, this can also be framed as a push to digitize existing historical records?

      Minor:<br /> Being able to process a 17 year old sample with HYPER-sol is a highlight of the applications of HYPER-sol. Mentioning the historical samples in the abstract would increase interest in the manuscript.

      Mixed feelings on use of informal terms such as “gold-mine” and “treasure trove” (pg 1): these could be replaced with “resource”. This terminology is not scientific.

      Imaging mass spectrometry specialists routinely use solvents to access FFPE tissues and to call them toxic chemicals may be an overstatement.

      Please clarify how much dry mass was used from the historical sample, and if that amount consumed the whole sample. A reference is made to 5 mg in Figure 1, but it is unclear whether or not that amount was also used in the historical sample. Also, were the historical tissues from tissue blocks or from stored tissue? This was not clear in the experimental details.

      Figure 1b uses “X”s in the table, even though X is a common abbreviation in the paper. Consider using check marks or another indicator instead.

      Figure 1b should include all conditions with acronyms that are referenced in the paper. For example FAS and XAS are not listed here but are mentioned in future parts of the manuscript, this would greatly increase readability.

      The spectral libraries referred to in “Mass Spectrometry data analysis” seem to be produced from flash-frozen tissue, but it would be beneficial to specify that this is the case given the variety of possible sources for a spectral library in the figure legend in which it is referenced.<br /> In supplemental figures 1a and 1c, what is the non-soluble material? It is also not clear what the normalization process is for relative residual pellet masses.

      In supplemental Fig 2, consider pairing the X-axis so that tissue sources line up between the two graphs. Another option would be to overlap the graphs, where yield and percent solubilized data points have different shapes. We would suggest keeping figure 2b’s X axis in the original order and layering the data from figure 1a on top.

    1. On 2019-08-07 13:15:48, user Masa Tsuchiya wrote:

      This paper reveals that

      1. Whole genome expression is dynamically self-organized through the emergence of a critical point (CP): co-existence of distinct expression response domains (critical states). this happens at both the cell-population and single-cell levels through the physical principle of Self-Organized Criticality (SOC).

      2. Coherent-Stochastic Behavior (CSB): Coherent behavior emerges in stochastic expression in both critical states and whole-expression:<br /> i) In whole expression, the dynamics of the CM of stochastic expression converges to that of the whole expression (Genome-Attractor).<br /> ii) In the critical states, the dynamics of the CM of stochastic expression converges to that of the corresponding critical state (Critical-State Attractors).

      3. The Genome-Attractor guides global coherent expression, whereas critical-state attractors guides local coherent expression emerged in critical states through heteroclinic critical transition.

      4. Characteristics of the CP are given by<br /> i) A fixed point during a specific biological regulation such as reprogramming and cell differentiation,<br /> ii) The center of mass (CM) of whole expression according to temporal expression variance, which is the order parameter of the self-organization of the whole genome expression,<br /> iii) ON-OFF state: A specific transition of the higher-order structure of genomic DNA corresponding to the CP, which suggests that the CP competes between the active (swelled or coil: ON) and inactive (compact or globule: OFF) states,<br /> iv) The Genome-Attractor: A change in the CP such as ON-OFF switch induces a global impact on genome expression - the origin of the genome-wide coherent expression waves.

      5. A potential Universal Mechanism of Self-Organization can be interpreted in terms of the Genome-Engine: An autonomous critical-control genomic system is developed by a highly coherent behavior of low-variance genes (sub-critical state) generating a dominant cyclic expression flux with high-variance genes (super-critical state) through the cell nuclear environment: the sub-critical state acts as the generator (source) of SOC control of the whole expression, whereas the super-critical state acts as the sink.

      6. Cell-fate change occurs<br /> i) When: The timing of the erasure of the genome-attractor of the initial state,<br /> ii) How: Coherent perturbation on the genome engine through the activation of the CP.

    2. On 2019-08-01 14:20:43, user Masa Tsuchiya wrote:

      The main points of the revision are that

      1. Genome attractor: The change in the CP affects the entire genomic system.<br /> Coherent behavior (i.e., CM dynamics) emerges in stochastic expression (coherent-stochastic behavior) in both critical state and whole-expression levels. In the coherent behavior, the dynamics of center of mass (CM) of stochastic expression converges to that of the whole expression (Figure 2), which reveals that the CP (Figure 1) acts as the genome attractor.Thus, a change in the CP such as ON-OFF switch induces a global impact on genome expression - the origin of the genome-wide coherent expression waves [Tsuchiya, M., et al, 2007].

      2. Furthermore, regarding critical states, the convergence of stochastic expression (see Fig. 10 in [Tsuchiya, M., et al., 2016] for cell population and Fig. 3B in [Tsuchiya, M., et al., 2017] for single cell) reveals that flux dynamics (effective force) between the CM of critical states describes interaction between critical-state attractors, which guides coherent expression behaviors emerged through criticality (i.e., heteroclinic critical transition).

      Therefore,<br /> 3. Whole genome expression is dynamically self-organized through the emergence of a critical point (CP).

    1. On 2019-08-06 20:30:57, user Renata Chapot wrote:

      In this article, the authors show some data suggesting that non-cognitive metrics, such as motivation, resilience and hard work, may be important to define a good candidate in a student selection process. In the closing sentence of the abstract, the authors state that their findings will “greatly facilitate the undergraduate selection process in any academic environment”.

      Nevertheless, although the topic is interesting, the conclusions are far from being generalizable to any academic environment. Although this is not stated in the abstract, the study was based on a sample of 10 subjects at a single department (Biochemistry and Biomedical Sciences) at McMaster University. Thus, there is no evidence that it necessarily applies to other academic environments – or even inferential statistics to show how representative they are of this particular department.

      Concerning the results, we also noticed that some graphs present what appears to be unfeasible percentage data. In Figure S3, for example, the quantities 91% and 9% do not seem to be obtainable in a study with 10 subjects. Similarly, on Figure 4, many of the pie graphs have percentages that do not add up to 100%.

      Moreover, the protocol for the semi-structured interviews is not available in the paper. This makes it difficult to understand the method performed, and even though the authors state that it is available upon request, it would be much more useful to provide it as supplementary material. Details of the qualitative analysis are also lacking, and the words that were used in the coding process are not provided.

      Based on this data alone, thus, the conclusions made by the authors in the abstract and discussion should be toned down in order to reflect the result. To generalize the findings to other academic environments – or event to the particular department under study, both a larger sample size and formal statistical analysis of the potential sampling error of the quantitative findings are warranted.

      These comments come from discussing this work at the Neurobiology and Reproducibility Journal Club (Dr. Olavo Amaral’s lab) at the Federal University of Rio de Janeiro.

    1. On 2019-08-06 16:48:43, user Suraj Kannan wrote:

      Thanks for the nice study. Do you have comparable images/data from Figure 1 for a control, for example on standard laminin or ECM? I have seen other studies of dedifferentiation, but it is difficult to appreciate Figure 1 (particularly panels a, c, d-g) without a control.

    1. On 2019-08-05 08:38:30, user Leonardo Araujo wrote:

      Hi, congratulations for the nice study! I went through the supplementary files and it looks like the gene NPC2 performed better than BATF2 in cohorts composed of adults. I think you should include the NPC2 in the downstream analysis and also compare its correlation with BATF2 (maybe a combination of these 2 genes might enhance the detection of incipient TB).

      cheers

    1. On 2019-08-05 05:34:57, user Richard Street wrote:

      The Goldilocks Zone as described appears to be what <br /> Vygotsky termed the Zone of Proximal Development. You have just narrowed the field for educators to look for when personalizing curricula for their students. Thank you and good job.

    1. On 2019-07-15 08:10:40, user Amina Echchiki wrote:

      Hi, I have a few questions :)

      (1) do you suggest using StringTie2 on a bam file generated by aligning the the best reads (e.g. full-length/2D/2-pass depending on the dataset) or it's better to use as many as we have access to? <br /> (2) I suppose a big part in here is choosing wisely the aligner, do you have any particular recommendations for long reads RNA-seq? last time we checked I think Minimap2 was the best, but maybe you have more up-to-date suggestions? <br /> (3) how is the software dealing with isoforms when more than one per gene is available? aren't the "super-reads" and the "error-correction" steps interfering with the isoform calling/assignment to the gene of origin? <br /> (4) I also suppose the coverage in the alignment file is an important parameter... what would you recommend to be able to use this software?

      we can continue via e-mail if you provide a find a corresponding address.

      thanks for your time, <br /> Amina

    1. On 2019-08-03 12:04:39, user disqus_1eGAIrBYWV wrote:

      Fascinating! As aging is one of the factors in expression of ADPKD (the biggest kidney disease leading to kidney failure in the US) will you look at this in relation to delaying onset of the disease while comparing outcomes and differences between the PKD1 (early-age onset) and PKD2 (later-life onset) types?

      (This seems like a perfect opportunity to address age-related gene differences with expression of two ADPKD mutations that occur early or later in life but both result in creation of cysts in the kidneys.)

    1. On 2019-08-03 11:25:59, user Mick Watson wrote:

      Nice paper!

      Some papers of ours you might be interested in:

      We'd also recommend subsampling of the data which can help assembly

      Cheers<br /> Mick

    1. On 2019-08-03 00:24:50, user Heteromeles wrote:

      Anthropocene refugia is not a novel term or concept, as it was proposed it in 2015 in the book Hot Earth Dreams by Dr. Frank Landis, and others are pursuing the same concept with plants in California. The analysis is quite welcome.

    1. On 2019-08-02 13:33:37, user Michael Jeltsch wrote:

      Last time when I checked - a few years back -, the extracellular domain of human VEGFR2 consisted of 7 IgG-like domains (and not of 8). This part of the Results section are not results but the results of previous research, correct? When I started to read it, it was not clear to me that you are talking about VEGFR-2 in general and not about the results of your work (that becomes clear only later when you switch to the real results with the sentence "In case of HyVEGFR-2..."). It is difficult to evaluate your analysis about the domain structure since you do not show the complete alignment of the whole EC domain of hydra VEGFR2 with the other VEGFR2 sequences, but only a partial alignment. I think you need to show the complete alignment and (to show which domains are equivalent and what the intervening sequences are).

    1. On 2019-08-02 09:23:59, user Susanne Kramer wrote:

      Nice work. It fits to the fact that ribosomal protein mRNAs are excluded from stress granules (Fritz et al 2015, NAR).<br /> Susanne Kramer

    1. On 2019-08-02 09:20:57, user Brian Northan wrote:

      I am curious which blind deconvolution algorithm the authors used. The term "Lucy Deconvolution" is very vague, and in practice there are many different implementaitons of Richardson Lucy, with different starting statess, different acceleration strategies, and different constraints.

      If the blind deconvolution is not optimal, and the deep learning system is trained on the deconvolved images, then in turn the deep learning system will not be optimal. Further, if the blind deconvolution algorithm causes artifacts, the deep learning system can learn the artifacts. The authors should show the initial guess of their PSF, the iterative outputs of the algorithm and the final PSF, to make the convergence properties of the blind deconvolution algorithm clear.

      The author needs to provide specific details and references on which Blind Richardson Lucy implementation they used. There are many approaches and the convergence properties of each approach are different.

      In practical use the blind Richardson Lucy is ussually constrained and different acceleration factors are used for PSF and Image (see https://www.osapublishing.o...

      What acceleration factors did you use for PSF and Image?? What constraints were used??

      (see for this researchgate topic for further discussion)

    1. On 2019-08-01 20:56:15, user Diego Jimenez wrote:

      Excellent approach to construct a minimal effective consortia.... Apparently, stochastic events plays important role in the consortium structural composition !!!

    1. On 2019-08-01 17:24:56, user argonaut wrote:

      In supp the ancient Orkney samples VK201 (I2a) and VK203 (R1b-L21), which autosomally are +85% UK, are radiocarbon dated by about 500, but in the excel files and in the paper they are taken as Viking age samples, like VK204 (R1b-U106) or VK205 (R1a), which have +50% Viking ancestry.

    2. On 2019-08-01 13:06:12, user Arne Jorgensen wrote:

      Is the influx from southern Scandinavia and Germany caused by the plague which exterminated most of the population in the 400 and 500 centuries, and the fimbul winter, caused by volcanic activity?

    3. On 2019-07-25 08:46:32, user Andrew wrote:

      Over 200 pages of supplementary information without a table of contents makes a lot of work for your readers. Why are the Orkney sites, Oxford and Weymouth listed as UK, not England or Scotland, but Llyn as Wales not UK. The Isle of Man is not in the UK.

    4. On 2019-07-20 14:35:06, user Georg wrote:

      The sample VK 542 from Ukraine, Chernigov is completely out of place. It is obvious that it has nothing to do with Vikings, or, to be precise, with the members of Rurik dinasty. Namely, even if we neglect the man's Y-DNA haplogroup, it should be noticed that he has absolutely no Scandinavian (or even Baltic) component in his autosomal DNA.

    5. On 2019-07-19 23:46:03, user 2blueherring3 wrote:

      Could we get some more information about the VK 474 individual? Is he definitely a E1b1b1b2a1a4 (L-791) or could he be from one of the sub branches like L791*, 2947 or 4971?

    1. On 2019-08-01 15:07:03, user stephens999 wrote:

      This interesting and impressive<br /> paper presents extensions, implementation and application of a recently-developed<br /> statistical methodology (the knockoff filter) to large GWAS (UK Biobank).<br /> The methods provide guaranteed control of False Discovery Rates when<br /> testing pre-specified contiguous groups of SNPs (or other variants).<br /> Importantly, the null hypothesis being tested here<br /> is not the commonly-used null that the group of SNPs is *marginally* unassociated<br /> with the trait; instead the null is that the group is<br /> *conditionally* unassociated with the<br /> trait given all other observed SNPs. This conditional test<br /> is in many ways more informative than conventional marginal tests<br /> because it ensures that a significant group cannot be<br /> explained by linkage disequilibrium (LD) with other measured SNPs outside the group.<br /> Thus the conditional test comes closer to identifying groups of<br /> potentially-causal SNPs than do conventional marginal tests.

      The paper is very well presented, and the results and comparisons with other methods<br /> seem generally appropriate and interesting. My main request<br /> is that the paper should better highlight the limitations of the method --<br /> specifically, at high resolution ("fine-mapping")<br /> the need to confine tests to pre-specified contiguous groups of SNPs<br /> seems a clear disadvantage compared with existing fine-mapping methods.<br /> This is not to take away from the other important contributions of this work.

      Major Comment

      As mentioned above, the main limitation of the current implementation<br /> (and perhaps the whole framework?) is the requirement<br /> that groups of tested markers be both contiguous and pre-specified.<br /> At coarser resolutions, where the<br /> main goal is to identify genomic regions (conditionally) associated with the trait,<br /> these requirements are not a major limitation. However<br /> at fine-scale resolutions, where one is trying to get down to<br /> the likely causal markers, these requirements becomes more bothersome.<br /> For example suppose we have 4 SNPs, in order, A-B-C-D, and A and D<br /> are in very strong LD with each other (say LD of 1 for concreteness),<br /> but not in strong LD with B or C, and A is the causal SNP. Then<br /> the contiguity requirement of knockoffZoom will not allow<br /> it to refine the association beyond the entire group (A-D),<br /> even though in principle one could narrow it down further to SNPs A and D.<br /> Existing fine-mapping methods do not have this limitation<br /> and could report (A,D) as the set of potential causal markers.<br /> Further, even if the contiguity requirement were relaxed<br /> (e.g. to allow prespecified non-continguous groups), the need to<br /> prespecify groups to be tested may still limit the resolution<br /> to which associations can be refined.

      For this reason I think it is premature to claim<br /> "...KnockoffZoom unifies locus discovery and fine-mapping into a<br /> coherent statistical framework" (p15). Specifically, I think its<br /> abilities to solve the fine-mapping problem are not<br /> yet adequate to make this claim, and that studies interested<br /> in fine mapping will continue to want to use<br /> existing Bayesian fine-mapping methods like SUSIE (quite possibly as a complement to knockoffZoom)<br /> to refine associations as far as possible.<br /> In any case, the limited resolution that comes with testing contiguous pre-specified marker<br /> groups should be better highlighted in the text.

      Besides better highlighting this limitation in text, the<br /> comparisons with fine-mapping methods should be extended to<br /> quantify the effect. Currently the comparisons show<br /> the "width" of region identified by each method (Figure 4, right panel).<br /> However, fine-mapping methods do not strictly identify a region but a set of<br /> SNPs, so the figure should also compare the number of SNPs identified<br /> by each method. It would also be informative to show<br /> the minimum pairwise LD between the markers identified -- does knockoffZoom sometimes<br /> report markers not in high LD with one another due to the contiguity<br /> constraint? (Incidentally, the y axis on this figure is too large to<br /> see the interesting region, which for fine-mapping is <0.1 Mb.<br /> Getting to a region of 0.5 Mb is not really fine mapping in my opinion.)

      It would also be interesting to get the authors' perspective on how easy<br /> or difficult it might be for the contiguity<br /> requirement to be relaxed in the future. (Also the pre-specification requirement,<br /> although this seems more fundamental.)

      Other main comments

      • Some aspects of Table 1 are surprising to me. Eg the<br /> number of bmi findings going from 24 -> 0 -> 15 as resolution increases.<br /> Shouldn't power increase as larger groups are tested? (I realize<br /> there are fewer tests as groups get bigger...so this is not a simple<br /> issue.) The hypothyroidism results are perhaps even weirder. Can you<br /> provide any intuitive explanation for why this might occur? Is it simply<br /> chance, since the knockoff procedure can produce different results if run<br /> multiple times?

      • The introduction criticizes the<br /> two-step approach as "not fully satisfactory because it requires<br /> switching models and assumptions in the middle of the analysis,<br /> obfuscating the interpretation of the findings and possibly<br /> invalidating type-I error guarantees." However, from Table 1 (see above comment),<br /> performing separate analyses at different resolutions appears to have similar problems<br /> regarding interpretation. The method that<br /> avoids "floating" discoveries at high resolution (Supplement S1B)<br /> seems to address this, but at a cost in power. What is that cost in power<br /> for the analyses here? How does Table 1 look if you apply that method?<br /> (with or without the 1.93 factor mentioned in the supplement).

      • As I understand it the output at each resolution depends on a single<br /> generation of the knockoff variables, and so the method will report<br /> different significant results each time it is run? Is this correct?<br /> If so, how different/similar are the results if you run things a second time<br /> with another knockoff realization? (It could suffice to do one trait twice<br /> to illustrate this)

      • The notation (X,Xtilde) suggests that the knockoffs are always included after the<br /> real variables in the input file to the lasso/bigsnpr. In principle the location<br /> of the knockoffs in the input file should not matter when a convex method like lasso<br /> is being applied (with the exception of variables with<br /> LD=1, which is already dealt with here as a special case). However, if one were to replace<br /> the lasso with non-convex methods the non-random order of the markers<br /> into the method could lead to failure to control FDR (eg if the method<br /> has a bias towards choosing columns earlier in the list of covariates).<br /> Further, even for convex methods, there is some concern numerical issues<br /> could arise to create this bias. As a safety check I suggest<br /> running the method with randomly ordered columns, or if that is<br /> too much of a pain simply reversing (Xtilde,X) to check it makes no difference.

      • I found the references to the Li-Stephens model vs fastPHASE<br /> model in the Supplement confusing. The description of Li-Stephens<br /> as "This HMM describes the distribution of genotypes as a<br /> patchwork of latent ancestral motifs"<br /> is incorrect - this describes the fastPHASE model.<br /> The Li-Stephens model describes each<br /> haplotype as a patchwork of<br /> other observed haplotypes, not latent motifs.<br /> As I understand the text all the models here are<br /> essentially fastPHASE models not Li-Stephens models.<br /> Please clarify.

      • The results in the supplement that reduce forward-backward calculations<br /> to O(K) and O(K^2) look similar to results that are already well<br /> established (e.g. Fearnhead and Donnelly,<br /> 2001, Estimating recombination rates from population genetic data, Genetics).<br /> Is there anything new here?

      • Please provide more details about the comparisons with other methods,<br /> including versions of software and the settings used.<br /> Ideally the code used to run the comparisons with other methods<br /> should be made available - even without documentation this can<br /> be invaluable for others to see what was done.

      Other comments/questions:

      • Getting the method working on problems of UK biobank scale is<br /> impressive, even though limited to "only" 591k SNPs.<br /> Would applying to ~50 million SNPs be feasible, and<br /> require about 100 times the computation?<br /> For coarse resolution it might not matter much to include the extra<br /> SNPs, but for fine-mapping it ultimately<br /> seems important to include as many SNPs as possible.

      • The paper discards tests where the knockoffs are very highly<br /> correlated with the original variables (which makes sense as<br /> they have no power). For intuition I would be interested to see the<br /> distribution of the correlation of knockoffs with the original variables<br /> (say at the finest resolution).

      • What is the MAF distribution of the variants analyzed here?<br /> Does the method work equally well for common vs rare variants?<br /> (I ask because the LD models may tend to work best for common variants.)

      • It would perhaps be helpful to cite (and contrast with) previous work that attempts<br /> to control error rates of conditional tests of groups of variables<br /> (eg work on hierarchical testing by Yekutieli, Meinhausen, Bu\"hlmann etc).

      Minor:

      p6: "by likelihood of the trait" -> "in distribution of the trait"

      p11: "its intrinsic limitations discussed above" - I do not see where they<br /> were discussed.

      p12: "As the resolution increases, we report fewer findings" - not always!

      Table 1: I suggest giving resolution in terms of kb instead of Mb. Is 0.000 down to<br /> single SNP resolution?

      p16: "possible *to* construct"

      refs: markov -> Markov ; uk -> UK

    1. On 2019-08-01 14:13:53, user André Müller wrote:

      Is there any reason you didn't try MetaCache or Ganon? I understand that benchmarking is very time consuming, but both tools seem to perform better on a lot of tasks than Kraken2 or any of the other tools that you tested according to this independent benchmark (LEMMI).

      Also,you state that Kraken2 was the only tool able to build a database for the human microbiome benchmark. But both MetaCache's and Ganon's memory footprint is around the same as Kraken2's, sometimes even lower.

      There is MiniKraken2 wich also uses a very small DB and is still pretty good. MetaCache can use a higher subsampling factor (option "-s" which can safe a lot of memory) without sacrificing much precision. That way you can build a database that is less than 8GB from the complete RefSeq baterial,archaea and viral complete genomes which still performs pretty well in terms of precision and recall. Metacache also allows to build the database in several chunks, query them independently and later merge the results. From what I can see in your figures, it should still be faster than kASA if you split the database into 8GB chunks and query them in sequence.

      I think you should include both Ganon and MetaCache in your analysis. It would also be nice to see your tool on the LEMMI benchmarking site, since they also use secret input data.

      Disclaimer: I'm the developer of MetaCache

    1. On 2019-08-01 00:13:01, user Charles Warden wrote:

      I'm not so sure this is correct (although there can be a whole general debate about the utility of PRS scores).

      For example, my Color lcWGS FASTQ file was much smaller than my Nebula lcWGS file:

      https://github.com/cwarden4...

      and it is very clear that the Nebula lcWGS analysis is unacceptable for specific sites:

      https://github.com/cwarden4...

      I apologize that the GitHub content is a bit messy. I hope to make a blog post summary of my main messages sometime this month.

      I did think my Color results were interesting, and I am mostly of European ancestry. However, I would learn towards removing ancestry and lcWGS analysis from the Color report.

    1. On 2019-07-31 18:47:16, user GuyguyKabundi Tshima wrote:

      The intersection between HIV, malaria and food security.

      Malaria control needs an integrated development plan. Malaria and food security are health and development issues that have appeared together in global frameworks since at least 1978. Our paper started by reviewing the reconstitution of body mass index in HIV positive patients living in Kinshasa, an endemic malaria area, and then will continue with functionals foods with therapeutic relevance by examining the functional connections as to how these affect people living with HIV and programmes in malaria endemic areas.

    1. On 2019-07-31 16:05:14, user Donald R. Forsdyke wrote:

      POSITIVE SELECTION OF THE IMMUNE REPERTOIRE<br /> .<br /> This interesting new contribution to bioRxiv (1) may inspire physicists to address immunological problems – a splendid goal! However, it does not accurately describe positive selection of the immune<br /> repertoire.<br /> .<br /> Positive selection means precisely that – positive selection – i.e. cells are actively selected for some characteristic that is likely to contribute positively to immune function. This contrasts with negative selection which actively removes cells because they have a characteristic that does not contribute positively, and may actively impair, immune function. <br /> .<br /> This active selection for what is becoming recognized as<br /> “near-self” reactivity, has a long history that is recently outlined (2). There are three fundamental thymic processes, death (or inactivation) by neglect, death (or inactivation) by negative selection, and positive selection. The following statement (1) wrongly indicates that positive selection should be equated with death by neglect:

      "If the receptor fails to bind any self-peptide, even weakly, it will probably fail to bind any protein and the cell carrying these receptors is discarded {a process called positive selection which removes 80% of immature cells}.”

      .<br /> 1. Altan-Bonnet G, Mora T, Walczak AM. (2019)<br /> Quantitative immunology for physicists. Biorxiv Here.<br /> .<br /> 2. Forsdyke D. R. (2019) Two signal half century: from<br /> negative selection of self-reactivity to positive selection of near-self <br /> reactivity. Scand. J. Immunol. 89: e12746 Here.

    2. On 2019-07-28 21:51:35, user Donald R. Forsdyke wrote:

      POSITIVE SELECTION OF THE IMMUNE REPERTOIRE<br /> This interesting new contribution to bioRxiv (1) may inspire physicists to address immunological problems – a splendid goal! However, it does not accurately describe positive selection of the immune repertoire.<br /> .<br /> Positive selection means precisely that – positive selection – i.e. cells are actively selected for some characteristic that is likely to contribute positively to immune function. This contrasts with negative selection which actively removes cells because they have a characteristic that does not contribute positively, and may actively impair, immune function. <br /> .<br /> This active selection for what is becoming recognized as “near-self” reactivity, has a long history that is recently outlined (2). There are three fundamental thymic processes, death (or inactivation) by neglect, death (or inactivation) by negative selection, and positive selection. The following statement (1) wrongly indicates that positive selection should be equated with death by neglect:

      "If the receptor fails to bind any self-peptide, even weakly, it will probably fail to bind any protein and the cell carrying these receptors is discarded {a process called positive selection which removes 80% of immature cells}.”

      .<br /> 1. Altan-Bonnet G, Mora T, Walczak AM. (2019) Quantitative immunology for physicists. Biorxiv Here.<br /> 2. Forsdyke D. R. (2019) Two signal half century: from negative selection of self-reactivity to positive selection of near-self <br /> reactivity. Scand. J. Immunol. 89: e12746 Here.

    1. On 2019-07-31 15:46:30, user Donald R. Forsdyke wrote:

      STRUCTURE SELECTION AT THE DNA LEVEL<br /> .<br /> It is good to see evidence that synonymous variants can significantly constrain the development of a population (1). Yes, this can be due to changed mRNA structure. However, mRNAs can acquire their<br /> structures by default, due to selective forces acting at the DNA level (2-4). Even demonstrations that mRNA structure is affected by a synonymous base change and that the synthesis of a corresponding protein is affected, may not necessarily mean that over evolutionary time this was the major selective agency. <br /> .<br /> Since the authors do not explore beyond mRNA (i.e. beyond exon) sequences, results from non-exon sequences, both intragenic (introns) and extragenic (3-4),should be of much interest. They<br /> note (1) that: “Studies first published in 1999 indicated that stable mRNA secondary structures are often selected for in key genomic regions across all kingdoms of life (Seffens and Digby 1999;<br /> Katz and Burge 2003; Chamary and Hurst 2005; Gu et al. 2010).” <br /> However, the latter authors did not assess “population constraint” as in the present elegant study (1). Furthermore. they appear to have overlooked earlier work indicating the possibility that synonymous variants might have been selected at the DNA level (3-4).<br /> .<br /> 1. Gaither JBS et al. (2019) Global analysis of human mRNA folding disruptions in synonymous variants demonstrates significant population constraint. Biorxiv Here.<br /> 2. White HB, Laux BE, Dennis D, (1972) Messenger RNA structure: compatibility of hairpin loops with proteins sequence. Science 175:1264-1266.<br /> 3. Forsdyke DR (1995) A stem-loop "kissing" model for the initiation of recombination and the origin of introns. Mol Biol Evol. 12:949-958.<br /> 4. Forsdyke DR. (2016) Evolutionary Bioinformatics. Springer, New York.

    1. On 2019-07-31 10:02:46, user Francisco Vargas wrote:

      Maximum Mean Discrepancy (MMD) [Fortet and Mourier, 1953, Gretton 2007] is a well-studied and known method in the statistics and machine learning community. In those works, the acronym MMD stands for maximum mean discrepancy, not mean measure divergence, which is a fairly different metric. This current version of the manuscript is using mean measure divergence as the definition of MMD whilst showing Maximum Mean Discrepancy in the implementation.

      The equation you present for MMD is clearly maximum mean discrepancy if you read the two sources you cite for it you will see the correct definition of the acronym. Furthermore calling it a divergence is a loose and arguably erroneous term. Unlike KL, MMD is actually a metric using the MMD( . || . ) divergence notation is not traditional for MMD please check Gretton 2007 for a less misleading notation ( MMD[F, p, q] F is the function class that induces the kernel and thus an argument ).

      It is worrying to see a biomedical paper using ML algorithms as blackboxes to the point where basic acronyms are wrong, whilst the correct terminology is right there in the relevant work cited in this paper. Just skimming these papers would be sufficient to find the right definition of the acronym ...

      [1] Fortet, R. and Mourier, E., 1953. Convergence de la répartition empirique vers la répartition théorique. In Annales scientifiques de l'École Normale Supérieure<br /> [2] Gretton, A., Borgwardt, K.M., Rasch, M., Schölkopf, B. and Smola, A.J., 2007, July. A kernel approach to comparing distributions. In Proceedings of the National Conference on Artificial Intelligence

    1. On 2019-07-31 09:57:14, user Rob Beynon #FBPE wrote:

      Interesting paper, and you might find this of relevance - here we looked a sperm proteomes for a range of ungulates and rodents, using identifiability as a way of establishing evolution rates..

      J Proteomics. 2016 Mar 1;135:38-50. doi: 10.1016/j.jprot.2015.12.027. Epub 2016 Jan 6.<br /> Cross-species proteomics in analysis of mammalian sperm proteins.<br /> Bayram HL, Claydon AJ, Brownridge PJ, Hurst JL, Mileham A, Stockley P, Beynon RJ, Hammond DE.

      https://www.liverpool.ac.uk...

      Abstract<br /> Many proteomics studies are conducted in model organisms for which fully annotated, detailed, high quality proteomes are available. By contrast, many studies in ecology and evolution are conducted in species which lack high quality proteome data, limiting the perceived value of a proteomic approach for protein discovery and quantification. This is particularly true of rapidly evolving proteins in the reproductive system, such as those that have an immune function or are under sexual selection, and can compromise the potential for cross-species proteomics to yield confident identification. In this investigation we analysed the sperm proteome, from a range of ungulates and rodents, and explored the potential of routine proteomic workflows to yield characterisation and quantification in non-model organisms. We report that database searching is robust to cross-species matching for a mammalian core sperm proteome, comprising 623 proteins that were common to most of the 19 species studied here, suggesting that these proteins are likely to be present and identifiable across many mammalian sperm. Further, label-free quantification reveals a consistent pattern of expression level. Functional analysis of this core proteome suggests consistency with previous studies limited to model organisms and has value as a quantitative reference for analysis of species-specific protein characterisation.

      SIGNIFICANCE:<br /> From analysis of the sperm proteome for diverse species (rodents and ungulates) using LC-MS/MS workflows and standard data processing, we show that it is feasible to obtain cross-species matches for a large number of proteins that can be filtered stringently to yield a highly expressed mammalian sperm core proteome, for which label-free quantitative data are also used to inform protein function and abundance.

    1. On 2019-07-30 22:49:30, user Charles Warden wrote:

      Thank you for putting together this paper.

      I was a little concerned when I saw "We estimate that a sample sequenced to the depth of 70 million total reads will typically have sufficient data for accurate gene expression<br /> analysis." for a couple reasons:

      1) For most gene expression projects, I think 10 million aligned reads is OK and 20-30 million total reads is often pretty safe. While the exonic percentage varies for library protocol, and I'm not sure about the unique read conversion (or if that conversation also varies between library protocols and sample types).

      2) I think the specifics have to be figured out for specific protocols (and raw data can be used for research purposes in different applications, or to check the validity of processed data).

      For 1), I think that was justified from both my own experience (with 50 bp single-end reads), as well as Liu et al. 2014 / Wang et al. 2011 / Tarazona et al. 2011. I noticed those papers while responding to this discussion.

      For 2), I don't exactly have a paper to show this, but I would say differential expression between groups requires testing / optimization per-project. So, you couldn't really define criteria that will work in all possible gene expression projects. While kind of messy, I have some notes from a Twitter discussion this past weekend.

      However, I think part of the discrepancy for b) is different interpretations for "differential expression," "over-/under-expression," and "outlier expression". I am mostly thinking of the 10-20 total million polyA reads for differential expression and genes with clear expression / over-expression. If you talking about a pattern that would more more likely to be a technical artifact, I can see how extra effort would be needed for gene expression analysis. For example, if you could have 2-3 biological replicates from slightly different sections of a sample (each with 10-20 million reads), that starts getting close to a total of 70 million total reads for that sample.

      I think your Figure 1A and Figure 4C (and possibly Figure 3C) makes me think there is more agreement than I originally expected from the abstract (since that emphasizes something with a threshold of 10-20 million MEND reads). However, I would say 90% specificity may be more reasonable for sensitivity (instead of 95%), for whatever metric is captured by that test. In general, I think 80% accuracy for a genomic signature is pretty good, and I think you need to be careful about over-fitting. That was part of the Twitter discussion that I linked above, but that is also described in my genomics for "hypothesis-generation" blog post.

    1. On 2019-07-30 20:00:41, user Harvard2TheBigHouse wrote:

      Is there any chance the location of the Huntington’s repeat codon could help reinforce your theory?

      The location of the repeat that eventually accumulates enough to lead to Huntington’s occurs in different places of the genome for Europeans, Asians, and Africans (where it seems to be rare at all)<br /> https://www.ncbi.nlm.nih.go...

      And since these repeats result in increased neural complexity (and eventually disease when the repeat number hits 36, in the European version) and accumulate over time - is there a way to show how the three different forms of Huntington’s fit into your OOEA trees?

      https://www.nature.com/arti...<br /> “The number of CAG repeats in HTT varies among species of vertebrate, and their expansion is greatest in humans. Huntingtin is essential for the development of the nervous system before birth; indeed, the researchers contend that CAG repeats might have contributed to the development of the complexity of the vertebrate brain.”

      And this paper shows that it turns out Neanderthals didn’t get Huntington’s at all, as well as having loads of other cognitive-behavioral differences:

      https://www.biorxiv.org/con...<br /> “Here, we present a revised list of 36 genes that carry missense substitutions which are fixed across 1,000s of human individuals and for which all archaic hominin individuals sequenced so far carry the ancestral state. In total, 647 protein-altering changes in 571 genes reached a frequency of at least 90% in the present-day human population. We attempt to interpret this list, as well as some regulatory changes, since it seems very likely that some of these genes would have contributed to the human condition.

    2. On 2019-07-30 14:39:18, user Harvard2TheBigHouse wrote:

      And my apologies, one additional suggestion inside the carrots:

      These realities are explained by major genetic diversity (MGD), which posits that mutation levels reach saturation points that reflect selective environmental pressures.‪

      <<‬This normalization and the subsequent stable saturation points provide an alternative explanation for global sub-populations appearing to share a common ancestor when their autosomal DNA appears to flow from diverse to more restricted, this illusion is created due to the fast that fast-mutating autosomal alleles react in parallel to comparable selective pressures present in different environments, resulting in shared parallel mutations that create a mask of shared ancestry that has been hiding the fact that fast-mutating autosomnal alleles in disparate populations adapt to selective pressures in parallel. >‪>‬

    3. On 2019-07-30 13:18:25, user Harvard2TheBigHouse wrote:

      Dr. Shi,

      I'm not sure how much it will help, but here's some suggested alternate phrasing for your opening or anywhere in your paper. (You seem to be starting from the broadest theoretical perspective, however many of your readers may be more comfortable starting with more discrete data points and having you build up to that broad perspective.)

      From its inception, Out of Africa only ever fit loosely around the available data: its statistical modeling was only a best-fit and has never dovetailed with all available genetic and archaeological evidence, it has never been able to explain extant out-of-place mtDNA in several global locations, and it is constantly being revised and tweaked in an attempt to explain how shabbily its premises fits around the novel genetic and archaeological data which continually weaken its theoretical framework.

      At the core of its flimsiness is a flawed assumption regarding basal mutation rates by the molecular clock and neutral theory; an inability to recognize that fast and slow alleles must have different clocks due to the fact only [slow/fast] alleles adapt to environmental challenges, while slow alleles - found primarily in mt and Y DNA - remain stable since they hold an organism's most fundamental and defining alleles. These realities are explained by major genetic diversity (MGD), which posits that mutation levels reach saturation points that reflect selective environmental pressures.

      In contrast to the neutral theory, MGD provides a theoretical pathway for the fact that it is only in fast-mutating autosomal DNA that sub-Saharan Africans have the most diversity, that does not hold when the ratio of autosomnal diversity to mtDNA or Y-chromosome diversity is factored in - an important measure since it is in our mt and Y SNPs that we are differentiated the most from apes. Additionally, for both mtDNA and Y-chromosomes - where our genome varies the most from apes - the architecture of both of those phylogenetic trees points to the fact that Asia is the only viable cradle for humanity since that is where maximum diversity - as counted by founding branches of their phylogenetic trees - is found.

    1. On 2019-07-30 16:22:07, user Giorgio Scita wrote:

      A revised version work is now published on line in Nature Materials

      Nat Mater. 2019 Jul 22. doi: 10.1038/s41563-019-0425-1. [Epub ahead of print]

    1. On 2019-07-29 17:52:34, user Kermit Ritland wrote:

      This is nice getting advanced notice. I am writing a paper on estimating Qst with just markers, it actually needs relatives in the dataset, while here they are excluded.

    1. On 2019-07-29 17:41:17, user Gideon Mordecai wrote:

      Very interesting paper, but maybe not the first report of this kind. Reminded me of Li et al. 2014 (https://www.sciencedirect.c... who detected honey bee viruses (including iflaviruses) in the pathogenic fungus, Ascosphaera apis. Also, I believe there are other picorna-like viruses which are known to infect fungi. Thanks for sharing the pre-print.

    1. On 2019-07-28 20:49:23, user Melania Nowicka wrote:

      The manuscript has been accepted for publication as a conference article for the CMSB 2019 in Trieste. The peer-reviewed version will be available soon in Lecture Notes in Bioinformatics (Springer).

    1. On 2019-07-27 06:44:29, user Sidra Vi wrote:

      A extremeIy interesting new methodological approach! This contributes very much to the rapidly growing field of scRNAseq data analysis.

      There is a issue to point: the Seurat workflow applied in the Hematopoietic/run_Seurat.R folder of the GitHub repo of BUSseq implementation is not comparable to the actual Seurat merging workflow. The same applies for the pancreas and simulated datasets.

      Comparisons to existing methods should be done considering other methods designed implementation, such as the one used for MNN correction (fastMNN). The referred Seurat vignette applies functions that weren't considered in the analysis described in this manuscript.

      The content of figures 4, 5 and 6 should be further corrected to reflect the algorithm actual implementation.

    1. On 2019-07-26 09:35:13, user Carlos Lopez-Vazquez wrote:

      Interesting work... just wondering how far were the SRT applied in each plant from the minimum SRT required for nitrification? <br /> And how can they relate and influence the maximum growth rates of nitrifiers and their decay rates (e.g. due to predation and other endogenous processes)?

    1. On 2019-07-25 17:29:27, user Rui Shi wrote:

      Published version:<br /> Filter paper-based spin column method for cost-efficient DNA or RNA purification<br /> Rui Shi, Ramsey S. Lewis, Dilip R. Panthee<br /> PLoS One. 2018 Dec 7;13(12):e0203011. doi: 10.1371/journal.pone.0203011. eCollection 2018.

    1. On 2019-07-25 04:38:12, user Egg wrote:

      Hello, is there actually an Adygei sample in your PCA? The teal dots all look like they belong to the Bergamo-Tuscany sample and not where a north Caucasian population would plot. Counting them, there are also only 21, the 8+13 Italians.

    1. On 2019-07-24 15:23:44, user Miguel Valvano wrote:

      Very interesting manuscript. I would like to make the suggestion that the clusters called "cps" be separated between O antigen (genes located between GalF and gnd) and cps [proper] genes located opposite from GalF. I am not sure for Klebsiella, but I know the is valid for Escherichia, Salmonella, Shilgella, Enterobacter and many others. You cannot properly call cps (as defined in this paper) as group 1 and group 4 capsules.

    1. On 2019-07-24 14:17:44, user Erik botlyr wrote:

      Why did you use constant flow ? Because constant pressure is physiologically more similar to normal. <br /> Synthesizing a CDNF without the corresponding amino acids for KDEL receptor binding would be one way of proving the participation of that receptor.<br /> It's a good research with some open gaps.<br /> Best Regards

    2. On 2019-07-14 23:04:04, user Hyo an wu wrote:

      According the data of your research, CDNF increased the calcium transient. This effect caused any disturbance in the ECG waves, intervals and amplitude?

    3. On 2019-07-10 14:22:21, user John Hevitern wrote:

      Did you investigate the UPR downstream after treatment with CDNF? Which kind of relationship the UPR have with CDNF treatment ? Do you think that activating UPR, regardless of whether it induces calcium depletion stress, would there be synthesis and secretion of CDNF? Could the CDNF have any genomic effects?<br /> Thapsigargin inhibits calcium uptake by RS, inhibiting SERCA. AKT activation by CDNF would have some effect on SERCA and would compete with SERCA inhibition by Thapsigargin?; or do you think that a CDNF / KDEL complex would also work, or both ?

    1. On 2019-07-24 13:59:46, user Michael Downey wrote:

      Cool paper - the question of how vacuolar proteases could make their way into the cytoplasm (and eventually into the nucleus) is a question that continues to come up in the literature. Has also been suggested previously for clipping of H3 by Prb1 PMID 24587380, where the potential roles of Nuclear-vacuolar junctions were also discussed. I wonder if the reporter described in this BioRxiv paper could be adapted for some sort of screen to find regulators of of all of this. I wonder: Does the Pep4 at play here ever reside in the vacuole?

    1. On 2019-07-24 11:22:51, user Aldert Zomer wrote:

      Interesting paper. I think Figure 1A should also display the presence or absence of the HPI gene cassette as a ring on the phylogenetic tree because phylogroup B2 is far more lethal than the other phylogroups.

    1. On 2019-07-24 09:25:27, user Daniel Corcos wrote:

      Since the submission of this article, I have incorporated the age-matched controls of the Age trial, from table 7 of Gunsoy et al. A significant excess of cancers occurs 6 years after the first screening round, and not before. <br /> I have also been able to identify a change in the age of breast cancer occurring late after screening in all the countries I studied. While anonymous experts have rejected the manuscript from spurious arguments, notable mammography experts are not willing to comment here.

    1. On 2019-07-24 08:45:15, user Eran wrote:

      Very interesting and welcomed study. Only by using the bottom-up approach we will be able to make a change on the grassroots. The politics and high level discussions are important but will not lead to the behavioral change by themselves. FAO is about to conduct a similar survey in several former USSR countries.

    1. On 2019-07-24 00:43:56, user Alex Crits-Christoph wrote:

      Thank you for sharing this work! I have a couple of minor comments:

      1. The reads used in the soil meadow dataset are usually 200-250 bp with a larger insert, where the other datasets are likely 2x150 bp. This may confound the cross environment comparison.

      2. You report that ~3,000 MAGs were removed from GtDB to generate the no_MAG version of GTDB. However, the GTDB statistics (https://gtdb.ecogenomic.org... for version 89 shows that about ~10,000 species (40% of all species) were represented only by a MAG - so shouldn't at least 10,000 genomes be removed for the no_MAG version? Were there really only 3,000 MAGs in version 86? The minor improvement observed from including mags doesn't seem to make sense for a database where 60% of phyla and 40% of species are only represented by a MAG.

      3. It is worth checking to see if any of the databases (particularly thinking about GTDB with its emphasis on MAGs) specifically include genomes assembled from the set of samples being analyzed, e.g. did any of the TARA oceans or Angelo meadow metagenome MAGs make it into GTDB, as this is worth thinking about.

      4. When we look at the genomes assembled in the soil meadow dataset (would suggest citing the more recent https://www.nature.com/arti... instead as this is the most thorough publication on this dataset) the vast majority of them are novel at the species level, and a large fraction are novel at higher taxonomic ranks. While these only represent ~15-30% of the community and the most abundant organisms in the dataset, it is worrying that such a large fraction of reads are reported by centrifuge as having species-level resolution. I suspect that centrifuge's specificity may be poor when the sample is dominated by species not in the reference database.

      5. In Figure 1D-F, are Genus and Species mislabeled? How is it possible to have a higher percentage of reads classified at the species level than at the genus level? Perhaps I misunderstand this figure.

      6. It would be really fruitful to have the assembled ground truth comparison, i.e. the assembled MAGs from that environment. For example, taking the Tara Oceans MAGs or the Diamond et al. 2019 MAGs, stringently mapping the environmental reads to these genomes, and then classify this subset of mapped reads. The MAGs could in turn be compared to the reference databases at the whole genome level (e.g. FastANI comparison), so the 'true' % of species that should actually be properly classified by Centrifuge is known. This sort of comparison would be extremely helpful for the community and get at whether metagenomic read classifiers should be trusted in environments which are highly underrepresented in reference databases.

    1. On 2019-07-24 00:34:53, user Mathieu Lajoie wrote:

      Apparently that package was never completed. There are no instructions, the github repo has been inactive for the last 2 years and potential user request was never answered...

    1. On 2019-07-22 08:50:23, user Alexander Bruce wrote:

      Dear Amy,

      Thanks for a compelling and well conducted study - just how the Tead4/Yap complex switches from repressor and activator in a pluripotency/ differentiation related gene context is fascinating.

      ATB, Alex Bruce (Czech Republic)

    1. On 2019-07-22 08:29:31, user Enrique Blanco Garcia wrote:

      Very nice piece of work, congrats! Indeed, not only in yeast, but in other species such as the fruit fly, there is a significantly growing body of literature arguing against the current view that histones and gene expression are tightly linked in all the transcriptional scenarios.

      References:

      • Hödl, M. & Basler, K. Transcription in the absence of histone H3.2 and H3K4 methylation. Curr. Biol. 22, 2253–2257 (2012).
      • Chen, K. et al. A global change in RNA polymerase II pausing during the Drosophila midblastula transition. eLife 2, e00861 (2013).
      • Zhang, H., Gao, L., Anandhakumar, J. & Gross, D.S. Uncoupling transcription from covalent histone modification. PLoS Genet. 10, e1004202 (2014).
      • Perez-Lluch, Blanco et al. Absence of canonical marks of active chromatin in developmentally regulated genes. Nature Genetics 47:1158-1167.
    1. On 2019-07-20 12:42:58, user Marcus wrote:

      Actually regarding my last comment, you can ignore the concern for chromatic aberration since I now note you used the same 488 nm excitation for both GFP and PI.

    2. On 2019-07-20 11:57:36, user Marcus wrote:

      It's probably too late now but an important control for PIN1 polarity extraction would be PI plus a membrane localized GFP. One likely source of bias is chromatic aberration. If the focal point for the two colours is offset along the z axis and the anticlinal wall is at an angle, then the two colours will be offset in the X-Y plane. This would produce a bias of apically localized PIN1, which in my opinion could help explain the difference between your extracted polarities and what we have deduced by eye.

    1. On 2019-07-19 12:12:25, user gbsod wrote:

      I think there may be minor error (2nd para of discussion). The Rey et <br /> al, 2015 paper is not related to ' to evidence indicating that fish are <br /> capableof identifying and reacting to the behaviour of conspecifics'.

    1. On 2019-07-19 00:23:12, user Guillermo Parada wrote:

      In the discussion you said "There is no evidence for RNA editing to modify splice sites yet", however at least in vertebrates there is evidence of GT-AA introns that are activated by ADAR and transformed to GT-AI (read by the spliceosome as GT-AG). In 2014 we found 7 putative splice sites that can be activated by A-to-I editing and one of them, located at ADARB1, was previously found by other researchers (see Table 2; https://academic.oup.com/na.... It might not be a very frequent event as non-canonical introns are very rare, but it's very interesting how different RNA processing events interplay during transcription.

    1. On 2019-07-18 20:58:17, user Morten S. Dueholm wrote:

      Dear Robin,

      We appreciate very much that you have taken the time to review our manuscript. We are happy that you can see the potential of the approach and we appreciate very much your comment and suggestions, some of which we surely will implement in our manuscript before submission. We have provided comments to the points raised in your review below in italic. If you have further comments or questions, you are very welcome to get back to us.

      Best regards,<br /> Morten and Per

      AutoTax BioRxiv Review, Jul 03, 2019, Robin Rohwer

      This manuscript describes a new method, AutoTax, to create databases out of full-length 16S rRNA gene sequences that can be used for taxonomy assignment of 16S rRNA gene amplicon data. This is a valuable contribution, because availability of full-length 16S rRNA gene sequences is increasing, and ecosystem-specific databases improve classification of amplicon data. Combining high-throughput generation of full-length 16S rRNA gene references with high-throughput database creation would be a major advancement. This manuscript also describes the application of AutoTax databases to the design of FISH primers, but I will focus my review on taxonomy assignment because that is my area of expertise.

      I believe this tool will be a valuable resource, which is why I have taken the time to review it carefully. I hope these comments will be addressed in the published version of this paper. I have three main concerns with this manuscript.

      First, it needs more detailed descriptions of the underlying methods that AutoTax uses for alignment and identity calculations. The manuscript refers readers to a github repo and supplemental documents for details, but this manuscript is introducing a new method so the algorithms/tools it employs should be clear in the main text.

      We have elaborated on the methods in the main manuscript, however we keep the detailed information on settings etc. in the supplementary as we believe that this information is only relevant for a small subset of the target audience and such information would remove focus from the practical aspects.

      Second, the taxon levels in the AutoTax database are assigned based on identity thresholds. The authors acknowledge briefly that this means their AutoTax databases lack phylogenetic information, but this is a major limitation so they should justify in more detail why they chose this method and why the resulting taxon names are still meaningful. They also cite a 2018 paper by Edgar (https://peerj.com/articles/... several times to support other claims, but do not mention that the main finding of this paper is that percent identity is a poor predictor of taxon level.

      Many of our sequences have close relatives in the SILVA database and obtain their taxonomy directly from this reference database. These taxa are therefore supported by phylogenic information. The denovo taxa are constructed based on sequence identity alone. This is a simple solution, which compared to phylogenetics, can be reproduced. Phylogenetics is especially problematic with the large dataset, which will become the standard in the future, as heuristic approaches are required for processing the data.

      The conclusion that "Percent identity is a poor predictor of taxon levels" in the 2018 paper by Edgar relates to V4 amplicons, which are known to have sparse phylogenetic information. That is why we in our studies only perform clustering of full-length 16S rRNA sequences, for which taxon thresholds are supported by statistics.

      Third, the importance of AutoTax would be more clear if the manuscript discussed how it fits in with similar work on improving taxonomy classifications. Many previous studies (examples below) have also found improvements in taxonomy assignment when databases are improved, yet here this is presented as a novel finding. Many other methods also exist for creating custom databases, yet they are also not discussed. I believe AutoTax is a novel contribution, but its value will only be clear and meaningful in the context of previous work.

      We do not consider that “better databases provide better classifications” is a novelty. This is already clear from our previous versions of the MiDAS database (and other databases), which is a manually curated version of the SILVA database. However, we will include some of the references in our manuscript as support for the need for improved taxonomy.

      Specific Comments:

      line 68- Great explanation of why taxonomy is needed for cross-study comparisons, there is a common confusion that ASV's have solved the problem.

      line 102- Since you are introducing an alternative tool to build databases, you should include more detail on the existing ways that databases are created and the comparative improvements and drawbacks when using AutoTax. (Perhaps in discussion instead of right here in introduction.) For example, how does AutoTax compare to the results from SINA (https://academic.oup.com/bi... and RAxML (https://academic.oup.com/bi... or FastTree (https://journals.plos.org/p..., or to manual curation using Arb (https://academic.oup.com/na...

      The references above do not link to tools which can be used to build databases, instead they represent tools that can be used to align/classify sequences (SINA) or infer the phylogenetic relationship of sequences (RAxML, FastTree, arb). We have previously used the SINA for sequence alignment and calculation of percent identity, however the algorithms in SINA is not suited for this purpose (see discussion with the author: https://github.com/epruesse..., which is why we decided to use usearch.

      line 109- Instead of (or in addition to) citing the original field guide (your ref #22), please cite TaxAss as the reference for the FreshTrain. The original Newton citation is appropriate when referencing the phylogeny, vocabulary, or ecology of the included Freshwater bacteria, but the TaxAss citation is appropriate when referencing the current updated version of the database that can be used for taxonomy assignment. (https://msphere.asm.org/con...

      We will add this reference along with the one for the original FreshTrain database.

      line 123, line 384- As you re-make your custom AutoTax taxonomy with a new version of SILVA, how does AutoTax prevent changes to the ecosystem-specific names in the first AutoTax version? It is clear in line 384 that you can add more custom sequences without messing up the existing taxonomy, but what if some of your sequences are added to SILVA itself? Couldn't that then shift all of your ESV centroids? And if you avoid that shift by leaving the duplicated sequence in the custom set, wouldn't AutoTax then ignore any added phylogenetic information that came from the additional SILVA curation, since preference is given to ESV centroids in the case of conflicts?

      Firstly, we do not include any of the sequences from SILVA into our ecosystem-specific database. When our own full-length sequences are added to SILVA this will therefore not have any influence on our ESV numbering and identification of ESV centroids. However, when the sequences are added we will get better support for the taxonomy assignment by SILVA, which will improve classification. So with time, when our’s and other’s full-length high quality sequences are added to SILVA, this will help to improve the taxonomic classification provided by SILVA and thus also improve our ecosystem-specific database.

      line 129- I found mixing the abbreviations ASV and ESV very confusing. For example, ASV's could also be considered "exact" sequence variants, since they are unclustered, and to add to the confusion you DO cluster the ESVs later in the method. Choosing different terms would help clarify when you are talking about full-length vs. amplicon sequences.

      We understand that it may not be clear from the abbreviations that ESVs refer to full-length 16S rRNA sequences, whereas ASVs refer to shorter amplicons. To avoid confusion we will use the term full-length ESVs in the manuscript.

      line 200- This "in brief" description of methods is inadequate, especially for a paper that is introducing the method for the first time. What algorithm does the script use to identify an ESV's closest relative- RDP classifier, SINTAX, BLAST? What algorithm does the script use to obtain taxonomy? What algorithm is used to calculate sequence identity? These choices have major impacts on your results, and for a paper that is introducing a method the workings of the method should not be hidden in supplemental materials or within code.

      We use a comprehensive usearch (-maxrejects 0) for identification of the closest relatives in the SILVA database and for calculation of the percent identity. We do not use any classifiers. We will make this more clear the in the next version of the manuscript.

      line 332- I would believe the results of a kmer-based method like the standard RDP/Wang classifier more than sequence identity. You are using the full SILVA database in your classification, so overclassification should not be a major problem. If you are worried overclassifications might be masking the gains of your method, you can double check by looking at how many classifications change when you use the new database. The main point of your reference #10 is that sequence identity is a poor predictor of taxonomic rank, and Edgar even states in the abstract that "95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal."

      Sequence identity is actually a fairly good predictor of taxonomic ranks, when full-length 16S rRNA sequences are analyzed (Yarza et al. 2014, Kim et al. 2014). The statement that "95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal." relates to V4 amplicons and not full-length sequences. Another key point is that the NCBI was used as the ground truth, and we know that this database contains many errors according to both conserved marker gene phylogenetics and ANI (Park et al. 2018, Ciufo et al. 2018).

      Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer K-H, et al. (2014). Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12: 635–645.

      Kim M, Oh H-SS, Park S-CC, Chun J. (2014). Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64: 346–351.

      Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, et al. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36: 996–1004.

      Ciufo S, Kannan S, Sharma S, Badretdin A, Clark K, Turner S, et al. (2018). Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI. Int J Syst Evol Microbiol 68: 2386–2392.

      line 367- What method specifically is used to "map" the ESV's?

      We used usearch11 -usearch_global -maxrejects 0 -id 0. This information is provided in the materials and methods.

      line 378- How do you know these classifications are correct when there is nothing to compare them to and no way to test for accuracy? Perhaps you could change it to "This approach will distinguish species-level classifications."

      Excellent point. This part does not include any taxonomy. We will rewrite or remove this sentence.

      line 387- When you defer to a centroid name over Silva, what happens to the Silva name? Is it lost in favor of the placeholder name? Could this result in a situation where known organisms are missed because they get classified as a new placeholder name instead?

      If the centroid of e.g. a species falls outside the criteria for a SILVA genus, all sequences within that species will obtain a denovo name, even though some sequences may fall within the threshold. However, there will most likely also be species, whose centroids fall within that genus, in which case it will remain. AutoTax creates a log of all these changes, which means that you can search for missing taxa if relevant.

      line 395- This "striking" finding is not novel. It is well supported in the literature that custom databases will dramatically improve classification and you should include that context when you describe your result. See for example: <br /> https://msphere.asm.org/con..., Fig 3<br /> http://journals.plos.org/pl..., Fig 5<br /> https://doi.org/10.1186/147..., Table 1<br /> http://dx.plos.org/10.1371/..., Table 4<br /> https://peerj.com/articles/494, Fig 3<br /> https://academic.oup.com/da..., Table 1<br /> http://www.sciencedirect.co..., Fig 6<br /> http://www.biomedcentral.co..., Fig 4<br /> https://peerj.com/articles/..., Fig 1

      Thanks for providing these references, they support why the method is relevant and will be included in the article.

      line 410- The improvement over MiDAS is one of your most compelling findings. You should elaborate on it more! For example, a little background on how MiDAS was created for your non-wastewater audience, and then you can emphasize how the volume of sequences is really important. This is the most compelling evidence for scientists to adopt your method when they already have small custom databases, and it is also the most appropriate test of improvement since MiDAS is the current standard for activated sludge community classifications.

      We will elaborate on this in the manuscript. The MiDAS database is actually a curated version of SILVA, which includes all SILVA sequences, so it is not a "small" custom database.

      line 417- This analysis is certainly useful to the wastewater research community, but it tests the resolution of primer regions, not the validity of database. To test how your database performs, you should use known amplicon sequences that do not already have an exact match in the database. For example, by creating amplicons from unincluded ESV's.

      We disagree with this. We show in the paper that we can make a database, which includes near-perfect references for almost all bacteria in an ecosystem. Therefore, it makes sense to test the resolution of the primer regions when there is an exact match in the database.

      line 490- How can you state that the AutoTax databases are near-complete when you haven't performed any completeness estimates? They are certainly "more complete"...

      We actually determine the completeness of the database in Figure 1. The results are based on mapping of amplicon data to the ESVs database and calculate how many of the amplicons have high-identity references in the database. The analysis is of course biased by the amplicon primers, but it is the best we can do.

      line 491- These public databases are not "much larger" than your database because your AutoTax database combines your new sequences with Silva, and this combination is therefore slightly bigger than Silva. This statement could be misleading to a less careful reader, because some custom databases are used alone, without being combined with Silva/Greengenes.

      We do not include any sequences from SILVA into our database. SILVA is only used for classification. The statement is therefore correct. We will make this clearer in the manuscript.

      line 496- How can you claim sub-species level classifications? You have explicitly stated that AutoTax uses a 7-level taxonomy.

      We are able to resolve multiple ASVs for each species. Therefore, we clam that the microbial community can be analyzed at the sub-species level.

      line 506- I like seeing a time estimate, but it is meaningless without some broad description of the computational platform used. "A few hours," is great, but was that on a standard laptop or a high throughput computing center?

      This is important and relevant information and we will add it to the manuscript.

      line 513- "Although the sequence similarity clustering does not necessarily reflect the true evolutionary history of the microbes or the phenotypic characteristics..." This is the biggest weakness of your method, and a major concern. It deserves more in-depth discussion. For example, you show improvement over a smaller custom database (MiDAS), but you define improvement based on how many ASVs were named. Is it really an improvement to end up with more names if those names are less meaningful? How valuable is a placeholder name when it lacks phylogenetic context? Also, you need to discuss the limits of sequence identity for defining taxonomic rank. You cite a paper by Edgar (ref # 10) multiple times, yet do not discuss its main finding that sequence identity is a poor predictor of taxonomic rank.

      Many of point here have been discussed above. An important point is that AutoTax uses the most recent phylogenetic information when possible (the SILVA taxonomy). The placeholder names serve as robust reference points until true taxonomies are made. They have the same diversity as true taxa and are therefore good substitutes until the taxonomy has been curated by phylogenetic experts or by genome-based methods such as those used in the genome-based taxonomic database (GTDB) and are being used to curate the NCBI taxonomy. The denovo names will be replaced by true taxonomic names as the databases are curated. We can easily keep track of these chances then AutoTax based databases are updated.

      line 530- "...will provide a common language for scientific communities..." How will AutoTax accommodate existing and future manually curated taxonomies that include phylogenetic information? How do you prevent dueling frameworks, the "more complete" vs. the "more correct." Can AutoTax be incorporated into existing manual curation efforts, or is it purely a separate approach?

      See comment above.

    2. On 2019-07-12 15:45:14, user Robin Rohwer wrote:

      This manuscript describes a new method, AutoTax, to create databases out of full-length 16S rRNA gene sequences that can be used for taxonomy assignment of 16S rRNA gene amplicon data. This is a valuable contribution, because availability of full-length 16S rRNA gene sequences is increasing, and ecosystem-specific databases improve classification of amplicon data. Combining high-throughput generation of full-length 16S rRNA gene references with high-throughput database creation would be a major advancement. This manuscript also describes the application of AutoTax databases to the design of FISH primers, but I will focus my review on taxonomy assignment because that is my area of expertise.

      I believe this tool will be a valuable resource, which is why I have taken the time to review it carefully. I hope these comments will be addressed in the published version of this paper. I have three main concerns with this manuscript.

      First, it needs more detailed descriptions of the underlying methods that AutoTax uses for alignment and identity calculations. The manuscript refers readers to a github repo and supplemental documents for details, but this manuscript is introducing a new method so the algorithms/tools it employs should be clear in the main text.

      Second, the taxon levels in the AutoTax database are assigned based on identity thresholds. The authors acknowledge briefly that this means their AutoTax databases lack phylogenetic information, but this is a major limitation so they should justify in more detail why they chose this method and why the resulting taxon names are still meaningful. They also cite a 2018 paper by Edgar (https://peerj.com/articles/... several times to support other claims, but do not mention that the main finding of this paper is that percent identity is a poor predictor of taxon level.

      Third, the importance of AutoTax would be more clear if the manuscript discussed how it fits in with similar work on improving taxonomy classifications. Many previous studies (examples below) have also found improvements in taxonomy assignment when databases are improved, yet here this is presented as a novel finding. Many other methods also exist for creating custom databases, yet they are also not discussed. I believe AutoTax is a novel contribution, but its value will only be clear and meaningful in the context of previous work.

      Specific Comments:

      line 68- Great explanation of why taxonomy is needed for cross-study comparisons, there is a common confusion that ASV's have solved the problem.

      line 102- Since you are introducing an alternative tool to build databases, you should include more detail on the existing ways that databases are created and the comparative improvements and drawbacks when using AutoTax. (Perhaps in discussion instead of right here in introduction.) For example, how does AutoTax compare to the results from SINA (https://academic.oup.com/bi... and RAxML (https://academic.oup.com/bi... or FastTree (https://journals.plos.org/p..., or to manual curation using Arb (https://academic.oup.com/na...

      line 109- Instead of (or in addition to) citing the original field guide (your ref #22), please cite TaxAss as the reference for the FreshTrain. The original Newton citation is appropriate when referencing the phylogeny, vocabulary, or ecology of the included Freshwater bacteria, but the TaxAss citation is appropriate when referencing the current updated version of the database that can be used for taxonomy assignment. (https://msphere.asm.org/con...

      line 123, line 384- As you re-make your custom AutoTax taxonomy with a new version of SILVA, how does AutoTax prevent changes to the ecosystem-specific names in the first AutoTax version? It is clear in line 384 that you can add more custom sequences without messing up the existing taxonomy, but what if some of your sequences are added to SILVA itself? Couldn't that then shift all of your ESV centroids? And if you avoid that shift by leaving the duplicated sequence in the custom set, wouldn't AutoTax then ignore any added phylogenetic information that came from the additional SILVA curation, since preference is given to ESV centroids in the case of conflicts?

      line 129- I found mixing the abbreviations ASV and ESV very confusing. For example, ASV's could also be considered "exact" sequence variants, since they are unclustered, and to add to the confusion you DO cluster the ESVs later in the method. Choosing different terms would help clarify when you are talking about full-length vs. amplicon sequences.

      line 200- This "in brief" description of methods is inadequate, especially for a paper that is introducing the method for the first time. What algorithm does the script use to identify an ESV's closest relative- RDP classifier, SINTAX, BLAST? What algorithm does the script use to obtain taxonomy? What algorithm is used to calculate sequence identity? These choices have major impacts on your results, and for a paper that is introducing a method the workings of the method should not be hidden in supplemental materials or within code.

      line 332- I would believe the results of a kmer-based method like the standard RDP/Wang classifier more than sequence identity. You are using the full SILVA database in your classification, so overclassification should not be a major problem. If you are worried overclassifications might be masking the gains of your method, you can double check by looking at how many classifications change when you use the new database. The main point of your reference #10 is that sequence identity is a poor predictor of taxonomic rank, and Edgar even states in the abstract that "95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal."

      line 367- What method specifically is used to "map" the ESV's?

      line 378- How do you know these classifications are correct when there is nothing to compare them to and no way to test for accuracy? Perhaps you could change it to "This approach will distinguish species-level classifications."

      line 387- When you defer to a centroid name over Silva, what happens to the Silva name? Is it lost in favor of the placeholder name? Could this result in a situation where known organisms are missed because they get classified as a new placeholder name instead?

      line 395- This "striking" finding is not novel. It is well supported in the literature that custom databases will dramatically improve classification and you should include that context when you describe your result. See for example: <br /> https://msphere.asm.org/con..., Fig 3<br /> http://journals.plos.org/pl..., Fig 5<br /> https://doi.org/10.1186/147..., Table 1<br /> http://dx.plos.org/10.1371/..., Table 4<br /> https://peerj.com/articles/494, Fig 3<br /> https://academic.oup.com/da..., Table 1<br /> http://www.sciencedirect.co..., Fig 6<br /> http://www.biomedcentral.co..., Fig 4<br /> https://peerj.com/articles/..., Fig 1

      line 410- The improvement over MiDAS is one of your most compelling findings. You should elaborate on it more! For example, a little background on how MiDAS was created for your non-wastewater audience, and then you can emphasize how the volume of sequences is really important. This is the most compelling evidence for scientists to adopt your method when they already have small custom databases, and it is also the most appropriate test of improvement since MiDAS is the current standard for activated sludge community classifications.

      line 417- This analysis is certainly useful to the wastewater research community, but it tests the resolution of primer regions, not the validity of database. To test how your database performs, you should use known amplicon sequences that do not already have an exact match in the database. For example, by creating amplicons from unincluded ESV's.

      line 490- How can you state that the AutoTax databases are near-complete when you haven't performed any completeness estimates? They are certainly "more complete"...

      line 491- These public databases are not "much larger" than your database because your AutoTax database combines your new sequences with Silva, and this combination is therefore slightly bigger than Silva. This statement could be misleading to a less careful reader, because some custom databases are used alone, without being combined with Silva/Greengenes.

      line 496- How can you claim sub-species level classifications? You have explicitly stated that AutoTax uses a 7-level taxonomy.

      line 506- I like seeing a time estimate, but it is meaningless without some broad description of the computational platform used. "A few hours," is great, but was that on a standard laptop or a high throughput computing center?

      line 513- "Although the sequence similarity clustering does not necessarily reflect the true evolutionary history of the microbes or the phenotypic characteristics..." This is the biggest weakness of your method, and a major concern. It deserves more in-depth discussion. For example, you show improvement over a smaller custom database (MiDAS), but you define improvement based on how many ASVs were named. Is it really an improvement to end up with more names if those names are less meaningful? How valuable is a placeholder name when it lacks phylogenetic context? Also, you need to discuss the limits of sequence identity for defining taxonomic rank. You cite a paper by Edgar (ref # 10) multiple times, yet do not discuss its main finding that sequence identity is a poor predictor of taxonomic rank.

      line 530- "...will provide a common language for scientific communities..." How will AutoTax accommodate existing and future manually curated taxonomies that include phylogenetic information? How do you prevent dueling frameworks, the "more complete" vs. the "more correct." Can AutoTax be incorporated into existing manual curation efforts, or is it purely a separate approach?

    1. On 2019-07-18 08:42:11, user Anestis Tsakiridis wrote:

      I really enjoyed your preprint and I'd like to note the interesting parallels with our recent paper (https://elifesciences.org/a... suggesting that posterior neural crest cells are also specified quite early, prior to definitive neurectoderm specification- citing it might help you strengthen your conclusion. I am looking forward to the extension of your work to trunk neural crest and later stages and also the examination of how regionalised cell fate correlates with developmental potency e.g. through heterotopic/heterochronic grafting experiments.

    1. On 2019-07-18 05:30:52, user Olivier Gandrillon wrote:

      Dear authors

      This is definitely a nice and important piece of work. I nevertheless find distressing that you seem to ignore a very closely related work using the very same idea of coupling 2 state models of gene expression and inferring conncetion parameters values, published in the two following papers:

      Herbach, U., Bonnaffoux, A., Espinasse, T., and Gandrillon, O. (2017). Inferring gene regulatory networks from single-cell data: a mechanistic approach. BMC Systems Biology 11:105 .

      Bonnafoux, A, Herbach, U, Richard, A, Guillemin, A, Gonin-Giraud, S, Gros, P.A. and Gandrillon, O (2019). WASABI: a dynamic iterative framework for gene regulatory network inference. BMC Bioinformatics 20:220.

      I definitely would think a comparison of the two modeling approaches might be informative to the readers.

      Best

      Olivier

    1. On 2019-07-18 04:39:30, user Jon wrote:

      Congratulations Haynes for your new tool for genotype free demultiplexing!

      Did you use some early version of scSplit to compare with demuxlet, vireo and souporcell?

      We would recommend using v0.7.5, which we used in our latest version of our preprint ( https://www.biorxiv.org/con... ), to get results highly concordant with demuxlet and hashtag-enabled demultiplexing (Stoeckius et al., Genome Biology, 2018), on the datasets in their papers. And you might want to filter out repetitive regions for the allele count matrices as mentioned in our paper and Github page.

      Jun Xu

    1. On 2019-07-17 08:01:59, user kbseah wrote:

      Cool paper! One small thing: I think the colors for "linked" and "unlinked" in the Figure 4 have been swapped by mistake, if I understand the figure correctly.

    1. On 2019-07-17 08:01:13, user Kazu Masa wrote:

      The work of this paper is quite similar to this paper (DOI 10.1007 / s00285-016-1031-3.).<br /> I felt as if I was reading a copy. Is the calculation of fluctuation in Fig. 3 really numerically calculated?

    1. On 2019-07-16 23:18:50, user Moe wrote:

      Hi there, very nice protocol, I will definitely give this a try using some different tissue.

      Just to confirm, when you grind the mouse lung tissue (Matrix D Tubes) prior to TFA lysis, what ginding instrument do you use to homogenize the tissue? Do you add any liquid into the grinding tube or just the tissue by itself? Is the tissue at room temp or snap frozen in liquid nitrogen? Is it kept cold during grinding? Regards

    1. On 2019-07-16 10:02:39, user sylvain garciaz wrote:

      Great work from Sebastian Müller and Raphaël Rodriguez, linking iron uptake by the glycoprotein CD44 and epigenetic plasticity. This paper paves the way for new comprehensive therapeutic interventions involving iron regulation in cancers.

    1. On 2019-07-16 07:50:01, user Rosalind Arden wrote:

      This sentence needs a tweak: my suggestions in [ ] and caps: From a developmental point of view, a longitudinal study of cognitive ability would benefit great[LY] by the incorporation of polygenic scores to determine whether age-related changes in the A and C variance components of intelligence (e.g., the decrease in C variance) [ARE} dependent on age-related changes in rAC (Haworth et al. 2010, Tucker-Drob & Bates, 2016).

    2. On 2019-07-15 22:34:01, user Charles Warden wrote:

      Thank you very much for posting this pre-print.

      First, a minor point:

      In the PDF, there is a typo in the last sentence ("polygnic" --> "polygenic"). This doesn't appear be an issue with the HTML version above.

      Second, am I correctly understanding that all of the results are simulated?

      I apologize if I am asking a simple question, but how much would this relate to the accuracy of the heritability estimates? For example, this paper raised concerns about over-estimations in heritability estimates:

      https://journals.plos.org/p...

      For example, Figure 1 in that paper shows a difference for height and BMI that I might expect for "Sib-Regression" and "RDR", but the "twin" method seems more similar than I would expect (meaning that the BMI heritability estimate seemed higher than I was expecting).

      For me, maybe this isn't crucial. For example, I had not previously heard of the Polderman et al. 2015 paper. So, I thought Figure 3 in that paper and the MaTCH web-interface was a very interesting set of empirical observations, which I would not know about if I hadn't read this pre-print.

    1. On 2019-07-15 23:10:28, user Charles Warden wrote:

      Thank you for posting this pre-print. It think it represents something that I should learn more about.

      In Figure 6, I noticed the population distributions were not centered around 0, and they looked like a (mostly) normal distribution. I also see that you mention "Rare genetic variants with large effect sizes (e.g. rs183373024 and rs1447295) contribute to the wide tails of each PRS distribution in Fig. 6."

      Have I correctly understood that these are mostly healthy controls in the new cohorts (similar to the 1000 Genomes samples)? If you had separated those with prostate cancer, would you have more of a bimodal distribution, and/or enrichment for samples with PRS > 3?

      In other words, it almost looks like the European population PRS values should be median centered (and/or re-calibrated, etc.). Can you take the intersection of informative markers reproduced between populations, and then see what the PRS distributions look like?

      I'm not sure how big a deal this is, but are there certain probes (without imputation) that can be used between arrays that have PRS distributions with a mean closer to 0? Also, there are 1000 Genomes African (AFR) super-population samples. Perhaps showing the PRS distribution for those samples would be nice as a Supplemental Figure (or perhaps the AFR super-population can be added to Figure 1, and the separate 1000 Genomes African populations can be a separate figure)?

    1. On 2019-07-15 19:06:59, user Cathy Stein wrote:

      I would recommend the authors read recent papers from the Ugandan household contact study where resistance to M. tuberculosis infection is documented epidemiologically. PMID: 30165605 is another documentation of highly exposed individuals not acquiring Mtb infection. In addition, PMID: 29304247 shows that many TST conversion events happen 3-6 months after ascertainment of the TB index case, so declaring subjects as "non-converters" after only 3 months of observation runs the risk of misclassificaiton bias. Lastly, the Clin Infect Dis study was the basis of a recent immunologic study (PMID: 31110348 ) that would be an interesting comparison to the work presented here by Weiner et al.

    1. On 2019-07-15 15:26:12, user Claire Lonsdale wrote:

      Very interesting work. I'd suggest that you try an alternative method for inactivating your non-viable RNA control culture using something other than hypochlorite which is well known for destroying nucleic acids

    1. On 2019-07-15 15:14:39, user Artur Wlodarczyk wrote:

      Correction: concentration of Na2CO3 in the 5xBG medium should be 0.472 mM. Both 5xBG and 5xBGM were sterile-filtered, not autoclaved.

    1. On 2019-07-15 09:41:29, user Regina Wong wrote:

      Loved reading this beautiful paper and thanks for providing gene signature associated with collagenase treatment! I was just curious why 2 hours incubation time for collagenase/hyaluronidase enzyme mix while 30 minutes for the Bacillus Lichenformis protease? I wonder if differential time duration can affect the RNA half-life?

    1. On 2019-07-15 09:35:15, user Jon Tobias wrote:

      Barely no signal for SOST in our recent sclerostin GWAS https://www.ncbi.nlm.nih.go..., so this MR is not really looking at effect of lifelong sclerostin levels, presumably some other mechanism responsible; perhaps the SOST SNPs represent a signal for BMD (which is known to be correlated genetically with T2DM)

    1. On 2019-07-15 09:28:55, user May-Britt Öhman wrote:

      I wonder about this "Saami DNA" - what is the reference point for that distinction? The Saami have lived in the region during a very long time, at least 2000 years but probably longer, and of course mixed with others coming in. It is not mentioned in the article, so please help me understand this. Or refer me to an article where this is discussed further. Thanks in advance, best, May-Britt Öhman, Uppsala University

    1. On 2019-07-15 07:24:17, user Xiaojie Qiu wrote:

      It is brought to our attention that we didn't refer the package versions we used in Figure 1. We used scvelo version 0.1.18 and velocyto 0.17.13 in our Gillespie simulation benchmark. We also note that dynamo benefits from the real time information in this benchmark while the other two tools don't rely on this information. For those who are more math-inclined, please note that now we also released the full derivation of the matrix form of the moment generation functions for parameter estimation in full_derivation.pdf file in the dynamo-notebook GitHub repo https://github.com/aristote... .

    1. On 2019-07-13 23:55:31, user Truman Lab wrote:

      This is a wonderful study of huge benefit to both the yeast community and those who study signal transduction. The incorporation of the 3D structure information is highly novel and allows analysis of sites that may impact protein-protein interactions! This work deserves recognition in a top-tier journal.

      Kinetics of phosphorylation events are important. Have the authors resolved the phosphorylation events of these processes over time? It would be fascinating (though obviously challenging and expensive) to see when phosphorylations appear and resolve after DNA damage. The timing may also speak to the hierarchy of signaling pathways involved.

    1. On 2019-07-12 15:27:14, user DD wrote:

      I have a question regarding Figure 4E. How can you check yellow only dots when you transfect cells with CTNS-eGFP and LC3-RFP-eGFP together? When the CTNS-eGFP protein locates to the lysosome, it will colocalize with the LC3 reporter which gives of only red signal (as eGFP in the LC3 reporter is quenched). This colocalization would result in yellow dots itself, and thus should increase the "yellow only" dot amount. I do not understand how you see the complete opposite of this, as you say there is a 1.2 fold decrease in yellow only dots when transfected with an CTNS-eGFP plasmid.

      The same reasoning can be used for figure 4F. I would suspect there are no red only dots anymore in the CTNS-eGFP transfected cells. Thus I would say that when you do see such dots, these are cells which were not transfected with the CTNS-eGFP plasmid and thus cannot be used to show a rescue of this phenotype by providing back the missing protein.

    1. On 2019-07-12 15:00:04, user Andy_Read wrote:

      I think the alternative splicing data and hypotheses around this are really interesting.<br /> I do think the authors should put the work in context of the findings in<br /> Bailey et al Genome Biology 2018, and <br /> Marchal and Zhang et al Nature Plants 2018 -- the first (I think?) example of a functional ID-NLR from wheat

    1. On 2019-07-12 14:24:49, user Casey Greene wrote:

      Could you clarify how the RNA-seq for bulk was done? Is there poly-A selection, rRNA depletion, or something else?

      Thanks!<br /> Casey

    1. On 2019-07-12 11:51:13, user Susheel Busi wrote:

      Very nice work!

      I'm trying to follow your protocol for the high-molecular weight DNA, and noticed that you used 'vaccuum grease' with SDS during the extractions. Could you please provide the catalog number for this item available through the Dow-Corning vendor? And also could you please explain, why you chose this instead of the phase-lock gel. Is the downstream purity of the sample better with the grease as compared to the phase-lock?

      Thank you!

    1. On 2019-07-12 11:24:51, user Craig Anderson wrote:

      I really enjoyed the paper and have a couple of questions/suggestions for additional end points.

      Firstly, I find the aflatoxin results super interesting- though the majority of donors in GTEx are "white", can you associate the relative proportion of the aflatoxin mutational signature with specific cohorts in your analyses? The consumption of mycotoxins has typically been associated with developing nations, particularly Asia and Africa. If the aflatoxin signatures are prevalent among other populations, then that's an important finding.

      I'm also interested in your mutation rates for EEMMs. Have you thought about whether or not these are different from expectations given what is in the literature for mutations arising from replication? For example, say there is one mutation per genome per replication and all of your tissues were established after only 6 divisions (32 cells, say, for 32 tissues), one would expect a single tissue-specific mutation and 2 mutations that will each be found in half of your tissues. This might allow you to estimate error rates that highlight the number of false positives in single-tissue genotyping or the number of sites you're probably missing. I appreciate that not only is it a massive over-simplification, but it could be affected by variation in fidelity during the early stages of development or differences in mutation rate between coding and non-coding parts of the genome.

      Anyway, thanks for making this available- I look forward to seeing it published.

    1. On 2019-07-11 19:09:16, user Joel W. Hockensmith wrote:

      Thank you for citing Muthuswami et al (21) in your latest posting. https://www.biorxiv.org/con...<br /> We enjoyed reading your manuscript but would ask you to consider revising the following sentence prior to final publication: <br /> “A class of phospho-aminoglycosides (phospho-kanamycin) inhibits the yeast SWI2/SNF2 complex but also have limited utility in mammalian cells, are relatively non-specific ATPase inhibitors, and would likely be highly toxic in this context21.” <br /> Here’s the reasoning for our request:

      1. Our publication that you cite notes that the phosphoaminoglycosides (aka ADAADi) are quite specific for SWI/SNF ATPases and far from “non-specific” since there is no inhibition of other ATPases despite the plethora of these cellular enzymes. More precisely, there are few (if any) other inhibitors with specificity that permits differentiation between DNA-dependent ATPases.

      2. With respect to “toxicity”, we encourage you to note that these inhibitory compounds are presumably synthesized in every eukaryotic cell that has been modified through molecular biology to become neomycin-, hygromycin- or geneticin (G418)-resistant (Dutta P., Tanti G.K., Sharma S., Goswami S.K., Komath S.S., Mayo M.W., Hockensmith J.W., Muthuswami R. “Global Epigenetic Changes Induced by SWI2/SNF2 Inhibitors Characterize Neomycin-Resistant Mammalian Cells,” PloS one. 7(11): e49822 (2012). PMID: 23209606 | PMCID: PMC3509132). Thus, there are a huge number of NeoR cell lines that have been exposed to the inhibitors and survived without reported consequences.

      3. With respect to “utility”, we would ask that you consider either:<br /> a. Wu, Q., Sharma, S., Cui, H., LeBlanc, S. E., Zhang, H., Muthuswami, R., Nickerson, J. A., and Imbalzano, A. N. (2016) Targeting the chromatin remodeling enzyme BRG1 increases the efficacy of chemotherapy drugs in breast cancer cells. Oncotarget 7, 27158-27175 <br /> OR <br /> b. Muthuswami, R., Bailey L., Rakesh R., Imbalzano A.N., Nickerson J.A., Hockensmith J.W., “BRG1 is a prognostic indicator and a potential therapeutic target for prostate cancer” J. Cell. Physiol. 2019 Jan 22. [http://dx.doi.org/10.1002/j...] PMID: 30667054. <br /> Fig. 4 in the former article is an example of “therapeutic enhancement” coupling an inhibitor of SWI/SNF activity with a chemotherapeutic agent, in some ways similar to your own approach. The latter article demonstrates the ability to use ADAADi to clear solid human tumors from an athymic mouse model without any observable toxicity to the mice.

      We hope that you will reassess your use of the terms “non-specific”, “highly toxic” and “limited utility” and perhaps mitigate the harshness conveyed in your sentence. It’s hard to imagine that you would want to deter other scientists from the pursuit of an understanding of ADAADi even before optimization of their use.”

      I’d be happy to discuss this further if you would like further clarification or disagree with our request.

    1. On 2019-07-11 17:42:25, user Vagner Benedito wrote:

      Very nice work, folks!<br /> I'd like to offer a few remarks to improve the manuscript:<br /> It would be good though to describe which leaves exactly were used for extraction, how much plant material per unit of solvent was used in the leaf dipping (and confirm that the exuding trichomes were preserved after the dipping - so that you captured RNA from these structures in the analysis - I'd fear that most trichomes would break from the leaves during the ethanolic dipping).<br /> An additional supplemental table with locus ID correlations between Sopen and Solyc orthologs would be very helpful.<br /> In your model (Fig 8), I'd add an element for the import of metabolites from the subapical cell, since it is very possible that the apical glandular cells is not fully responsible for generating substrates for acylsugar biosynthesis!<br /> I really like the way this paper used a simple comparative RNA-Seq approach to identify novel candidate genes involved in acylsugar metabolism - great job!

    1. On 2019-07-11 15:56:09, user Filippo Cernilogar wrote:

      Very<br /> interesting work addressing the determinants of TF binding specificity. Preexisting<br /> chromatin matters. We also come up to a similar conclusion looking at the<br /> pioneer transcription factor Foxa2: its endoderm binding sites are primed by<br /> low levels of active chromatin modifications in embryonic stem cells<br /> (Pre-marked chromatin and transcription factor co-binding shape the pioneering<br /> activity of Foxa2. bioRxiv doi: https://doi.org/10.1101/607721 and Nucleic<br /> Acids Research "in press").

    1. On 2019-07-11 14:35:13, user Tarik Haydar wrote:

      Looks like a really interesting study! I have a question about the breeding for the captured cells: were the Thy1-GFP mice crossed with the C3H component of the 003647 F1 hybrids (to generate a comparable F1 hybrid breeder) before mating to Ts65Dn?

    1. On 2019-07-10 21:12:46, user Tatiana Arias wrote:

      We would like to know you opinion about the paper in order to improve before we submit to publication. We are hoping to submit to GBE, do you think this is a good fit for our paper? do you recommend another journal? THANKS!

    1. On 2019-07-10 19:47:35, user k_kunzelmann@yahoo.com.au wrote:

      Nice work, however, we found already 2015 (Ousingsawat et al 2015, Nature comm) that increasing Ca2+ /increasing stimulation leads to increased non-selectivity of the channel (see Supplement Fig. 2). Would be fair citing. Best regards, Prof. K. Kunzelmann

    1. On 2019-07-09 23:29:45, user Michael Deem wrote:

      I recently became aware that my conflict of interest statement was not included: The Conflict Interest Statement of this manuscript should be updated to include "Michael W. Deem was a scientific advisor to Direct Genomics and held a < 1% beneficial equity interest in the company."

    1. On 2019-07-09 21:32:49, user Jordan Rowley wrote:

      Great data! I was trying to find the crosslinking conditions for the Micro-C data. Is it the same as the in situ Hi-C prep or are you adding DSG, ESG, or both with FA like in the Micro-C XL protocol?