Hypothesis

78 Matching Annotations

Last 7 days
www.biorxiv.org www.biorxiv.org

Uncovering Developmental Lineages from Single-cell Data with Contrastive Poincaré Maps

3
1. keith.cheveralls 04 Sep 2025
  
  in Arcadia Science
  
  The first important observation is that state-of-the-art approaches,except CPM, fail to produce an embedding for the complete dataset (containing 100,000 cells),due to their reliance on pairwise distances for the computation of embeddings, which scalesquadratically in the number of cells
  
  This doesn't feel quite fair, as UMAP and tSNE were designed to handle datasets of this size and have been widely used to generate embeddings for single-cell datasets of this size and larger. Also, I believe at least UMAP is sub-quadratic in the number of samples, as it uses an approximate kNN algorithm that is n log n.
2. keith.cheveralls 04 Sep 2025
  
  in Arcadia Science
  
  Figure 3: Space and time complexity analysis.
  
  Minor comment: using a log-log scale for these plots would be helpful, as it would prevent the reference methods (UMAP, tSNE, PHATE) from appearing as a flat line.
3. keith.cheveralls 03 Sep 2025
  
  in Arcadia Science
  
  On synthetic trees with up to 5 generations and 34,000individuals, CPM cuts distortion by > 99%
  
  It would be helpful to clarify what this claim is based on, as I can't see anything in Figure 2 that indicates a 99% change in any of the metrics between CPM and PM.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2025.08.22.671789v1.full.pdf
May 2025
www.biorxiv.org www.biorxiv.org

RamanMAE: Masked Autoencoders Enable Efficient Molecular Imaging by Learning Biologically Meaningful Spectral Representations

8
1. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  Overall, these re-sults show that the RamanMAE embeddings capture biolog-ically relevant information and have utility for downstreamclassification tasks in challenging biological datasets
  
  I agree that this analysis sounds promising, but I think it would be more convincing with a comparison to more traditional methods (e.g. random forest after PCA or NMF)
2. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  e shaped400-dimensional Raman spectra in the fingerprint wavenum-ber region
  
  Since several different datasets were used here, what preprocessing was necessary to obtain spectra with this consistent range and resolution? Was it necessary to upsample or downsample the spectra from some datasets?
3. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  As expected, the UMAP projection showsthat the P231, CTC, and LM cells are progressively clustered
  
  I would be very cautious about drawing this (or any) conclusions on the basis of a UMAP, as UMAP embeddings can be strongly dependent on hyperparameter choices.
4. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  Since the circulating tumor cells will have a population ofcells that will become lung metastatic cells in the future, asignificant degree of overlap between CTC and LM cells isreasonably expected
  
  I'm not sure I follow this: just because some CTCs will eventually become metastatic doesn't necessarily mean that they will look similar or "overlap" in the present.
5. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  We split the entire high SNR dataset into training,validation, and test dataset
  
  I think more details abour how the data were split into train and test sets would be important to include here. From reading the supmat of [20], it sounds like the 172k spectra were obtained from 11 hyperspectral images, implying that many of the individual spectra are likely to be highly correlated, so I think care would be needed when splitting into train/test sets to avoid data leakage (e.g. perhaps by splitting at the level of the original hyperspectral images)
6. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  We leveraged a large publicly available Raman spec-tral datase
  
  This is missing a citation - presumably this dataset was from [20]?
7. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  we useda masking ratio of 0.5 to randomly mask half the region ofthe spectra in each step
  
  Could you comment on this was value was chosen? It seems like it could be a really important hyperparameter...
8. keith.cheveralls 27 May 2025
  
  in Arcadia Science
  
  For each spectrum,we used a patch size of 16 in 1D spectra, which translatedto a patch size of 4 in 2D image, yielding 16 patches foreach spectrum
  
  If the spectra are 400-dimensional and the patch size is 16, there would be 20 patches per spectrum, not 16.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2025.05.18.654618v1.full.pdf
Apr 2025
www.biorxiv.org www.biorxiv.org

Life as a Function: Why Transformer Architectures Struggle to Gain Genome-Level Foundational Capabilities

6
1. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  Figure axes are Cartesian coordinates of 2-dimensionalvectors representing each nucleotide and then summed up to get the 2D line as by Yau et al. [21].
  
  I think it would be helpful to briefly describe how these 2D lines are constructed, as they may be an unfamiliar way of visualizing sequence data for many biologists.
2. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  As model size increases, outputaccuracy seems to improve, with the biggest model showing a generated sequence without repeats. This behavior arisesfrom the abundance of training samples for Enterovirus C (4141) compared with Gallivirus A (15).
  
  Could it instead be that larger models are simply less prone to getting stuck in repeats?
3. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  Incontrast, in panel B, the two largest models learn the Enterovirus C function very well, suggestingmemorisation rather than generalisation,
  
  From the figure, it doesn't look like the models are simply recapitulating the true sequence. I think a more direct comparison like a sequence alignment between the true and generated sequence would be more informative here.
4. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  Said differently,although mutations at any replication point might be Brownian and viral-cell interactions chaotic,viral genomes adapted to a host (i.e. a particular niche) approximate a temporary Lyapunovstable point to minimize system-wide energy expenditure
  
  This feels pretty vague and hand-wave-y. The term "Lyapunov stable point" is mentioned without establishing or discussing the mathematical conditions or assumptions that would justify using it. Also, phrases like "system-wide energy expenditure" are not defined; what is meant by "energy", and why would viruses adapt in order to minimize it?
5. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  Two, its constraints arefunctional not sequential, backward not forward looking, developmental not constructive
  
  This is so succinct that it may be hard for readers to understand what is being referred to here. If these constraints are important, I think it would be worth elaborating on them a little bit.
6. keith.cheveralls 17 Apr 2025
  
  in Arcadia Science
  
  From this foundation, we can define Ω to be a summed characteristicacross 𝑁 entities
  
  I appreciate that this presentation is intentionally abstract, but I think it would be helpful to explain more explicitly what omega represents here. In other words, what is the nature of the "characteristics" z_i? I think readers may assume that these characteristics refer to phenotypes, but later on, it becomes clear that these "characteristics" actually refer to something like "genomic state" (if not simply a genotype)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2025.01.13.632745v2.full.pdf
Dec 2024
www.biorxiv.org www.biorxiv.org

Quantifying Data Distortion in Bar Graphs in Biological Research

4
1. keith.cheveralls 23 Dec 2024
  
  in Arcadia Science
  
  We suggestinadequate data science education as a key factor behind these issues
  
  In the absence of any evidence favoring one factor or another, I think it would be fair to include other possible factors here (especially as, imo, a lack of data science education is not the most plausible one)
2. keith.cheveralls 23 Dec 2024
  
  in Arcadia Science
  
  minimize data misinterpretation
  
  This is a bit of a nitpick, but I'm not sure it's fair to equate "visualization errors" with "data misinterpretation", either on the part of authors or readers, because visualization errors can be made for other reasons by authors and won't necessarily lead to data misinterpretation by readers (even if they do make it more likely)
3. keith.cheveralls 23 Dec 2024
  
  in Arcadia Science
  
  The prevalence of misused bar graphs is potentially caused by a systematic lack of data science training inscience, technology, engineering, and mathematics (STEM) education, research training, and the academic
  
  fwiw, another possible explanation that strikes me as more plausible is that there are very strong incentives for authors to publish in prestigious journals, which in turn incentives them to present their results in the most compelling possible light.
4. keith.cheveralls 23 Dec 2024
  
  in Arcadia Science
  
  consistent with our hypothesis thathaving more authors increases the probability of having coauthors who contribute visualization mistakes
  
  Perhaps a simpler explanation is that papers with more authors tend to be longer and to have more figures, increasing the chances that at least one figure contains a mistake.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.09.20.609464v2.full.pdf
Nov 2024
www.biorxiv.org www.biorxiv.org

SpotMAX: a generalist framework for multi-dimensional automatic spot detection and quantification

5
1. keith.cheveralls 27 Nov 2024
  
  in Arcadia Science
  
  The ground truth in this dataset is known a priori215because there are 6 chromosomal crossovers in almost every cel
  
  I don't think it's quite accurate to say that the ground truth is known here unless the meaning of "in almost every cell" can be quantified. I agree that the fact that SpotMAX finds six foci in 97% of nuclei is qualitatively consistent with the biology of COSA-1 foci, but there's no way to quantify this performance without first quantifying the true prevalence of nuclei with more or less than six foci (and I think we do know it can't be 100%).
  
  It's also important to specify the meiotic stage (that is, the position of the nuclei within the gonad) at which foci are counted, since (if I recall correctly) COSA-1 foci appear gradually, so I think in mid-pachytene there will be many more nuclei with fewer than six foci than at late pachytene.
2. keith.cheveralls 27 Nov 2024
  
  in Arcadia Science
  
  revealed an unexpected mechanism that selectively targets nuclei282with errors in crossover regulation for apoptosis independently of the DNA damage checkpoint
  
  Is it known that the presence of RAD-51 foci alone is a reliable proxy for the activation of the DNA damage checkpoint in late meiotic prophase? (I'm not sure, but I wouldn't be surprised if it is not)
3. keith.cheveralls 27 Nov 2024
  
  in Arcadia Science
  
  SpotMAX performs local peak detection with a footprint determined from the expected spot size
  
  Does SpotMAX support a range of expected spot sizes, or does it require one specific size?
4. keith.cheveralls 27 Nov 2024
  
  in Arcadia Science
  
  Crucially, we provide an intuitive and user-friendly GUI integrated into the Cell-ACDC software9
  
  I think it'd be helpful to briefly explain what Cell-ACDC is and why it is important that SpotMAX is integrated into it, as some readers may not be familiar with it.
5. keith.cheveralls 27 Nov 2024
  
  in Arcadia Science
  
  The representative images show two instances where the human made a counting218mistake
  
  If I recall correctly, COSA-1 foci aren't always diffraction-limited, so I'm not sure that it's really possible to determine the true number of foci in the two examples shown, at least from the 2D images shown in the figure (though I agree that the example on the left does look a lot like two barely-separated distinct foci)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.10.22.619610v2
Oct 2024
www.biorxiv.org www.biorxiv.org

An automated microfluidic platform for toxicity testing based on Caenorhabditis elegans

10
1. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  . In addition, fluorescence images showed that the324body length of nematodes after exposure to L511A was significantly shorter than325that of the control group,
  
  Does "significantly shorter" mean that the difference was statistically significant? If not statistically significant, I'm not sure it's worth mentioning. If it is, could this data be added to Figure 5?
2. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  the342total body relative fluorescence intensit
  
  The phrase "total body" here implies the intensity is a sum, which would mean it depends on worm length. I think it'd be important to clarify this.
3. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  the overall fluorescence intensity of all GFP-labeled neurons
  
  Is this the average intensity? (so that the intensity is independent of worm length)
4. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  Combining the above results, we can draw the following conclusions:
  
  If the differences were not statistically significant, I don't think these conclusions can be drawn.
5. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  This suggests that these two flavoring substances and298tobacco component may produce neuroexcitatory effects at low doses in nematodes
  
  Is an increase in bend frequency known to indicate neuroexcitation? It seems like it could also result from metabolic changes, perturbation of feeding behavior, or direct effects on muscle cells.
6. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  For X6145A and Fla-1, the bend296frequency of worms also showed a trend of slightly increasing in low-dose group and297decreasing in high-dose group
  
  If the trend wasn't statistically significant, I'm not sure this is worth commenting on.
7. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  its bend frequency per minute was basically consistent with that of the291control grou
  
  Is it possible to make a more more precise statement than "basically consistent"? This is a little confusing, especially since there was a statistically significant difference between the control and the higher dose.
8. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  ig. 3A showed that treatment of MCP266and BDE-47 induced significant general toxicity in worms on this platform
  
  How was worm death measured?
9. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  ig. 3C&F showed270that the bend frequency of worms decreased significantly (p < 0.001) with the271increase in dose after 24 h treatment
  
  For the 1mM MCP condition, it looks like most worms were still alive after 24 hours, but the bend frequency of all worms was zero. Does this make sense?
10. keith.cheveralls 04 Oct 2024
  
  in Arcadia Science
  
  Fig. 3. Toxicity evaluation of MCP and BDE-47 on the platform.
  
  It looks like this figure is missing the number of worms analyzed for each condition. Also, the survival curves in Panel A are in discrete steps of the same size, suggesting a smaller number of worms than there are dots in Figures B and C.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.08.27.610021v3.full.pdf
Jun 2024
www.biorxiv.org www.biorxiv.org

ESPRESSO: Spatiotemporal omics based on organelle phenotyping

4
1. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  Asphenotypic changes during keratinocyte differentiation span across both space and time, this applicationperfectly showcases the power of ESPRESSO spatiotemporal omics in identifying not only the presenceof distinct phenotypes, but also providing insights about their spatiotemporal evolution.
  
  Again, I think a baseline here would make this claim more convincing. In other words, what aspects of the differentiation dynamics described here could only be captured by ESPRESSO?
2. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  As shown in Figures 1c and 1d, GMM clustering easily identified the cell type-specific phenotypes andallowed the quantification of properties of interest in their organelle networ
  
  It would be helpful to compare this result to some baseline obtained from an established method like cell painting. In other words, can existing techniques also readily distinguish these cell types?
3. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  to increase the acquisition speed 16-fold
  
  it would be helpful to also provide some absolute measures of throughput here, such as how many FOVs of a given size and resolution can be imaged per unit time.
4. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  organelle properties are normalized, selected and reduced in dimensionality byPacMAP35, generating low-dimensional embeddings that encode the high-dimensional organelleproperties of each cell. A Gaussian Mixture Model (GMM36) clustering algorithm is then applied
  
  It sounds like the clustering was done after the embedding step; that is, using the low-dimensional embeddings from PacMAP, rather than the original feature matrix. If so, I'm worried that this will result in inaccurate clusters, as PacMAP (like all such methods) does not perfectly preserve the relationships between the original high-dimensional feature vectors.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.06.13.598932v1.full.pdf
May 2024
www.biorxiv.org www.biorxiv.org

Illuminating the functional landscape of the dark proteome across the Animal Tree of Life through natural language processing models

4
1. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  This suggests that genes not annotated by eggNOG-mapper are probablyproteins that either catalyze some protein, RNA, or DNA chemical modification, or bind to othermolecules, form macromolecular complexes, and are involved in the regulation of essentialprocesses for animals
  
  This is a bit confusing; it's so vague and general that it sounds like it could describe almost any protein.
2. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  We therefore considered this evidence assupportive for not filtering
  
  I'm not sure that two examples can constitute evidence for or against filtering. Is it possible to use a ground-truth dataset to make this kind of filtering/no-filtering decision with more confidence?
3. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  We show thatprotein language model-based annotations outperformed deep learning-based ones
  
  This is a bit confusing, because protein language models are a kind of deep learning model. It would help to clarify what "deep-learning-based models" refers to in this context.
4. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  with a reliability index of 1
  
  What does a reliability index value of "1" mean?
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.02.28.582465v2.full.pdf
Mar 2024
www.biorxiv.org www.biorxiv.org

AlphaFind: Discover structure similarity across the entire known proteome

7
1. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  The search starts with acytochrome from corn (Zea Mays), and within the first 50 hits,we find similar structures originating from various animals(fish, eagle, mouse, cat, horse, etc.)
  
  The phrase "within the first 50 hits" feels tantalizing. What else appeared among the top hits? Were there hits that were surprising or potentially false positives? And were there proteins that should have appeared among the top hits, but didn't?
2. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  Here, AlphaFind shows us (Figure 2)that highly similar hemoglobin structures can also be found inother species.
  
  Again, it would be really great to quantify what "highly similar" means here.
3. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  in an average of 7 seconds withnegligible back-end load
  
  It would be helpful to mention details about the hardware here, as the time cost is hard to interpret without that information.
4. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  Therefore, high occurrence of unstructuredregions in the input structure can bias the search. Thisphenomenon is more prevalent in coiled-coil structures but canbe also observed in some small structures
  
  Again, it would be great to quantify this and/or to discuss some examples of proteins for which this is a real problem.
5. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  We tested AlphaFind on a diverse set of proteins varying insize, complexity, and quality. AlphaFind provided biologicallyrelevant results even for small, large and lower qualitystructures. When AlphaFind did not offer structures withhigh TM-scores, the results remained biologically relevant.
  
  I think these claims would be more convincing if they could be quantified and if the performance of AlphaFind could be compared to other existing tools, if possible.
6. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  he latter two methodsin conjunction with (10) establish the basis of the indexingsolution presented in here
  
  What is the relationship between this approach and approaches to indexing or similarity-based lookup used by common vector databases?
7. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  In the offline phase, we first extract semantic information fromraw cif files into vector embeddings,
  
  It would be helpful to explain in more detail how this is done, since it seems like a crucial step.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.02.15.580465v1.full.pdf
Feb 2024
www.biorxiv.org www.biorxiv.org

Universal Cell Embeddings: A Foundation Model for Cell Biology

9
1. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Every376chromosome group is combined into a single sequence, with chromosome order randomly deter-377mined.
  
  It's surprising to me that chromosomes are randomly ordered; this feels a bit like the equivalent of randomly shuffling the clauses of a sentence. It would be helpful to explain this choice or discuss reasons why it might or might not be a concern.
2. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Start tokens are unique to each chromosome and species
  
  This feels confusing: if start tokens are unique to species, how is UCE able to generate embeddings for datasets from species it was not trained on?
3. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  However, beyond that,238the effect levels off (Supplementary Fig. 6). This is expected due to the curse of dimensionality239in high-dimensional spaces and the variability in the level of ontological refinement in different240branches of the ontology
  
  This feels awfully hand-wavy. I can understand that a leveling off is expected at some distance, by why at 5 hops in particular?
4. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  or all three species we observed204very high agreement between independent annotations of the novel species’ data and the nearest205cell type centroids in the IMA
  
  It would be helpful to mention here what these three species were and how distantly related they are to the eight species on which UCE was trained.
5. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  We train185a simple logistic classifier on the UCE embeddings of the Immune Cell Atlas [38], and then apply186the classifier to B cell embeddings from Tabula Sapiens v2. This classifier accurately classifies the187Tabula Sapiens v2 cells as memory and naive B cell
  
  This result feels hard to interpret without a comparison to other approaches or models. In other words, are embeddings from UCE uniquely able to capture the information required for this classification task?
6. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  UCE embeddings174distinctly separate cell types more effectively than other methods tested in zero-shot
  
  This feels a bit subjective; I think this claim would be more convincing if it were grounded in a quantitative measure of clustering accuracy.
7. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  We compared several methods and found that UCE substantially out-167performs the next best method Geneformer by 9.0% on overall score, 10.6% on biological conser-168vation score, and 7.4% on batch correction scor
  
  If possible, it would be helpful to contextualize these relative increases in performance, particular given that none of the models listed in Supp Table 1 appear to significantly outperform using the log-normalized raw data. (the "overall score" is 0.74 for UCE and 0.72 for "log-normalized expression"). Without more context, it's hard to know what this means, whether it should be surprising, whether it reflects limitations of the metrics or of the models, etc.
  
  Also, I think it would be more transparent to mention here that there are two metrics for which UCE does not outperform other models (the ARI score and the "ASW (batch) score").
8. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Genes belonging to the same chromosome are grouped111together by placing them in between special tokens and are then sorted by genomic location
  
  It would be helpful to understand the context and motivation for this design decision. In other words, what aspects of UCE's performance depend (or are suspected to depend) on including information about genomic position?
9. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  This allows UCE to meaningfully99represent any gene, from any species, regardless of whether the species had appeared in the training100data
  
  It would be good to clarify here if "training data" refers to the data used to train the protein language model or UCE itself.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.11.28.568918v1.full.pdf
www.biorxiv.org www.biorxiv.org

High-volume, label-free imaging for quantifying single-cell dynamics in induced pluripotent stem cell colonies

6
1. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  When calculating doubling times based on mitotic events in the remaining cells that were470not undergoing apoptosis (Figure 6D), the doubling times are similar to those for unexposed cells
  
  Again, it's great to see something like this quantified so carefully!
2. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Higher intensities of excitation light exposure led to significant cell death that was apparent by manual447inspection of images, and by the reduced relative cell numbers as shown by the green lines
  
  It seems surprising that there is such a big difference from 1x to 1.4x. Is this by design? (was the 1x intensity chosen from prior experience or experiments to be as high as possible without inhibiting cell division?)
3. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  s shown in Figure 6A,438exposure of cells to the minimal intensity of fluorescence excitation light (56 mJ/cm2 referred to as 1x)
  
  It's super helpful that an absolute measure of intensity is provided here, but it would be great to also include the wavelength (or range of wavelengths) of the excitation light.
4. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Individual cells in the center of the colony tend to move less than cells near the
  
  Is it possible to correct this for the fact that, as the colony itself expands, cells near the edge necessarily must move more than cells in the center (which will not move at all, if the colony as a whole is stationary)?
5. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Average mitotic rates do not appear to depend on431distance from the colony edge (Figure 5D) and do not correlate with the increased cell motion
  
  It's great to see a subtle detail like this quantified so carefully! Is this consistent with prior work (if there is any)?
6. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  The233manual data was paired to the 3D U-Net inferenced results using a linear sum assignment routine with234the cost function being proportional to the distance between mitosis events in space with an empirically235determined spatial cutoff of 15 pixels and a time cutoff of 6 frames.
  
  This is a bit hard to understand. How is distance in time measured? (i.e. the difference between the time of mitosis onset in the manual annotations and the segmentation results)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.09.29.558451v2.full.pdf
Jan 2024
www.biorxiv.org www.biorxiv.org

High-volume, label-free imaging for quantifying single-cell dynamics in induced pluripotent stem cell colonies

4
1. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  are 1 to 2 to 20 to 20.
  
  how were these weights chosen? And is it correct to think of these weights as a kind of correction for the class imbalance between non-mitotic and mitotic nuclei?
2. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  in each of the 5202frames before division, and as class 3 (one or two daughter cells) in each of the 3 frames after division.
  
  how were these numbers of frames chosen?
3. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  The binary masks are created by151inferencing with 3 instances of the same model and thresholded by 2 (as explained in more detail in152Supplemental Figure 1B and 1C.
  
  This is a little confusing, especially the "thresholded by 2" part, and I didn't find the caption in Supp Fig 1 to be that much clearer. It would help to explain the origin of the variability in the predictions (in other words, what is an "instance of the same model"?)
4. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  We trained a 2D U-Net to segment single-cell nuclei from phase contrast images starting with a pre-136trained U-Net (14) as our initial network
  
  It would be great to mention what kind of images the pre-trained model trained on. Do you have a sense for how important it is that pre-training be done on similar images? (and what kinds of similarity are most important: cell type, imaging modality, magnification, etc)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.09.29.558451v2.full.pdf
Dec 2023
www.biorxiv.org www.biorxiv.org

A Pooled Cell Painting CRISPR Screening Platform Enables de novo Inference of Gene Function by Self-supervised Deep Learning

8
1. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  Wereport the number of gene KOs with an AU-ROC > 0.55
  
  Why 0.55 and not 0.5?
2. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  We trained a ViT-small model with patch size = 8, number of global crops = 2, number of local crops = 8on 4 nodes x 8 NVIDIA-V100 GPUs per node (32 GPUs) for 100 epochs
  
  would it be possible (and meaningful) to mention how many GPU hours this required? Also, some more details would be helpful for non-ML experts; e.g., why the choice of 100 epochs, was a stopping criterion used, which epoch was used for the final analysis/results, etc.
3. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  we re-parameterized the first layer ofthe model as:
  
  This equation is a bit opaque; it would be helpful to explain what the superscripts and subscripts of theta mean.
4. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  (both ~1-1.5million cell tile images)
  
  Does the 1-1.5m figure mean single-cell images? or FOVs? It would also be super helpful to comment on how this dataset size was chosen. Was it the minimum amount of data required for this level of performance? More generally, did you do any experiments varying the quantity or diversity of the training data?
5. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  The superior performance of CP-DINO 1640 is unlikely a result oftrivial memorization, as the 1640-genes druggable genome library and 300-genes MoA library sharesimilar numbers of overlapping genes with the 124 PoC library (30 and 26 genes respectively).
  
  I think to make this claim more convincing, it would be important to show how many genes in the 1640 library are very similar to (rather than merely identical to) genes in the 124 PoC library ("very similar" is obviously subjective but I'm thinking of homologs/paralogs or genes that are components of the same complex or pathway)
6. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  nti-phospho-S6 (pS6) antibodywith AlexaFluor 750-conjugated secondary antibody was used in the 6th channel as an establishedbiomarker
  
  it would be helpful to mention here what cellular structures of features the pS6 antibody labels, and also (for the non-biologists among us) what mTORC1 is
7. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  Nevertheless, CP-DINO 300 trained on bioimaging data yielded a moreinformative embedding that has higher median prediction accuracy than the other two models (Fig.S4a-b), and correctly classified more perturbations with better accuracy (Fig. 4c). CP-DINO 300 alsorecovered more known biological relationships from StringDB as measured by cosine similarity of theaggregate gene KO embeddings (Methods) than the other two models (Fig. 4d)
  
  It's awesome to see such an explicit and direct comparison of classic feature engineering with modern unsupervised ML models!
  
  If possible it would be great to quantify how much better the DINO-based approach is; Figures 4a-d are a bit hard to understand at first and obscure the relative differences; Fig 4d in particular doesn't give the impression that DINO is that much better than the CellStats approach (even though the 0.12 of DINO vs the 0.09 of CellStats is actually a 30% improvement!). Also, some measure of statistical significance would be helpful; in particular, how likely is it that the 0.09 vs 0.12 in Fig 4d is reproducible?
8. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  phenotypic clustering of genes by their annotated mechanism of action,
  
  It feels like there's a typo here somewhere, since genes don't really have a "mechanism of action" and the screen here does not involve compounds but rather gene KOs. Is the idea to use the phenotype of the KOs to cluster genes by the MoA of the compounds that target them? In any case, the reference to MoAs here is doubly confusing because the clustering shown in Fig 4E appears to capture cellular localization (and also pathway membership?), but I couldn't see any discussion of the clustering relative to the MoAs of the compounds used to select the 300 genes
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.08.13.553051v3.full.pdf

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL