52 Matching Annotations
  1. Last 7 days
    1. unmixing. In this context, Raman measurements xi ∈ Rb aretreated as mixtures xi = F (M, αi) of a set of n unknown end-member components M ∈ Rn×b based on their relative abun-dances αi ∈ Rn in a given measurement xi. Blind unmixingaims to decompose a set of Raman measurements X ∈ Rm×b(i.e. a given scan) into endmember components M and theirrelative abundances A ∈ Rm×n.To achieve this, we train an autoencoder model A consist-ing of an encoder module E and a decoder module D. Theencoder was responsible for mapping input spectra x into la-tent space representations z = E (x), and the decoder was re-sponsible for mapping these latent representations into recon-structions of the original input bx = D(z) = D(E (x)) = A (x).The model was trained in an unsupervised manner by min-imising the reconstruction error between the input x and theoutput bx. During this process, the model was guided to learnthe endmember components M and their relative abundances Athrough the introduction of related physical constraints. Below,we provide additional details about the developed architectureand training procedure. For more information about hyperspec-tral unmixing, the reader is pointed to previous works by Ke-shava and Mustard55 and Li et al.56. For more informationabout autoencoders, the reader is pointed to the work of Good-fellow et al.81.The encoder E comprised two separate blocks applied se-quentially. The first part was a multi-branch convolutionalblock comprising four parallel convolutional layers with ker-nel sizes of 5, 10, 15 and 20, designed to capture patterns atmultiple spectral scales. Each convolutional layer contained32 filters with ReLU activation, He initialisation82 and ‘same’padding. Batch normalisation83 and dropout with a rate of0.284 were applied to each convolutional layer to improve train-ing stability and generalisation. The outputs of the four convo-lutional layers were merged channel-wise through a fully con-nected layer to yield an output of dimension matching that ofthe input spectrum. The rationale behind this was to transformintensity values into representations that capture local spectralfeatures (e.g. peak shape, width, local neighbourhood) and thuspromote better generalisability. The second part of the encoderwas a fully connected dimensionality reduction block, appliedto learn patterns between the learnt spectral features. This blockcomprised a series of fully connected layers of sizes 256, 128,64 and 32 with He initialisation and ReLU activation. Batchnormalisation and dropout (rate of 0.5) were also applied ateach fully connected layer. The block was followed by a finalfully connected layer (Xavier uniform initialisation85) that re-duced the final 32 features to a latent space of size n. The num-ber n was treated as a hyperparameter that encodes the numberof endmembers to extract, with latent representations treated asabundance fractions. To improve interpretation, non-negativitywas enforced in the latent space using a ‘softly-rectified’ hyper-bolic tangent function f (x) = 1γ log(1 + eγ∗tanh(x)), with γ = 10,as we previously reported53.

      It seems totally reasonable to use convolution across spectral space because any one place in the spectrum should contain information about proximal spaces. Similarly though, it seems reasonable to expect that there should be shared information across proximal pixels across the imaging space. Likely, pixels that are on the boundary between the nucleus and outside of the nucleus seem likely to have different spectra than those within the nucleus. As a result, combining convolutions across both physical and spectral space (a truly hyper-spectral convolutional autoencoder) seems both reasonable and like it would leverage the spatial information to further refine the endmember definitions. Did you try something like this?

  2. Dec 2025
    1. to-linear fiber paired with fiber collimator) and the high-resolution confocal (105 μm fiber withlens tube) mode of the standard ORM setup obtained using an MCS-1TR-XY electronmicroscopy calibration grid and collecting line profiles across the chromium-silicon features.Intensities are the measured Raman intensity, averaged over 100 measurements of apowdered aspirin tablet as an exemplary scattering sample, and the silicon region of thecalibration target at 1595 cm-1 (C=O stretch) and 520 cm-1 (c-Si) respectively. Acquisitions wereperformed with a 785 nm laser at 45 mW with a 500 ms integration time using either a 10x OlympusPlan N or 40x Olympus UPlanSApo objective (NA 0.25 and 0.95 respectivel

      Do you have resolution specifications for a higher NA objective? For folks who might want higher spatial resolution than 2.5um it would be great to know how far the system could be pushed. Related, have you tried a 'true' confocal light path with focusing optics and a pinhole? Again, it would be great to know how far the system could be pushed.

    1. To717evaluate the relationship between taxonomic and phenotypic alpha diversity metrics, we performed718both linear and log-transformed linear regression analyses between ASV- and OPU-derived richness,719Shannon, and evenness values. Ordinary least squares (OLS) regression models were fitted for each720treatment using the statsmodels Python package (v0.14.1) (105). In the log-transformed models, both721the independent and dependent variables were transformed using the natural logarithm of one plus722the value (log1p) to accommodate zero values and improve numerical stability using Python Numpy723(v2.2.4) (106). For each model, the coefficient of determination (R²) and corresponding P value were724extracted to assess the strength and significance of the relationship.

      It would be nice to have a more comprehensive analysis of the relationship between OPU and ASV since there may be many drivers of correlation between OPU and ASV, prevalence of species being one, but also, you might have differing environmental factors diving correlation in OPU that deviates from ASV. If you could examine the correlation between OPU sets or features and environmental factors (such as organic\non-organic, or plant type) after controlling for ASV it might more directly identify aspects of biology that are driven to be similar based on growth conditions and not different species presence.

  3. Sep 2025
    1. From these images, we obtain a pixel-wise dynamic frequency response given by the absolute value of the Fourier transform of the temporal phase signal, <img class="highwire-embed" alt="Embedded Image" src="https://www.biorxiv.org/sites/default/files/highwire/biorxiv/early/2025/04/26/2025.04.22.649403/embed/inline-graphic-3.gif.backup.1745987201.1834"/> for each spatial pixel in the image.

      Just to clarify: are these the absolute pixels in the entire imaging field? or the pixels of individual segmented cells? if segmented cells, was there any registration of the images? Are the cells moving or do you have evidence that some of these changes in the phasor analysis don't result from jitter in the positions of the cells?

    2. Here, we take a stack of 600 sequential qOBM images taken at 8 Hz (this frame rate was selected to capture metabolic activity within the cell)

      Is the expectation here that metabolic changes will cause changes in the refactive index on a time scale of greater than 8 Hz? Is that a realistic time frame for metabolic changes? or are these other structural changes in response to activation?

  4. Jun 2025
    1. significant negative MMI for the feature set seems to be a reasonable idea to build an independent featureset.

      One concern about only looking at negative MMI is that you will miss other types of interactions. For example, high positive MMI (or interaction information) is associated with 'common cause' effects and represent another important class of interactions. Did you consider including both cases or including signed representations?

    2. 𝐼 (𝑋 ; 𝑌 ; 𝑍 ) = 𝐻 (𝑋 ) + 𝐻 (𝑌 ) + 𝐻 (𝑍 ) − 𝐻 (𝑋, 𝑌 ) + 𝐻 (𝑋, 𝑍 ) + 𝐻 (𝑌, 𝑍 ) + 𝐻 (𝑋, 𝑌, 𝑍 ) (2)Eq. 3 can be rewritten in terms of MI as𝐼 (𝑋 ; 𝑌 ; 𝑍 ) = 𝐼 (𝑋 ; 𝑌 ) − 𝐼 (𝑋 ; 𝑌 |𝑍 )

      This generalization of MI to more than two variables is also known as interaction information and captures the unique information gained by knowing all three variables beyond only knowing subsets of the variables. There are other extensions of MI to more than 2 variables. In particular total correlation allows quantification of the shared information between all three variables. As a result, total correlation has been used for feature selection. Did you consider using total correlation in this case?

  5. May 2025
    1. All models are extremely good at predicting average effect sizes as measured by Pearson correlation (attention: r = 0.82, linear: r = 0.83, linear+pairwise: r = 0.87)

      These are well correlated with true effect size, but can these estimates be used to accurately identify causal loci?

    2. type as <img class="highwire-embed" alt="Embedded Image" src="https://www.biorxiv.org/sites/default/files/highwire/biorxiv/early/2025/04/11/2025.04.11.648465/F2/embed/inline-graphic-8.gif"/>, which is the difference between the predicted phenotype with the l-th locus being 1 versus -1 in that genetic background at all other lo

      This is essentially a linear prediction of effect size. Given interaction effects may manifest as non-linear effects, would mutual information be a better estimate of the 'true' effect size?

  6. Apr 2025
    1. Improved identification of discriminatory features.The CellPhePy package includes an improved method for feature selection, enabling morereliable identification of features that effectively discriminate between different cellpopulations. T

      Using elbow method for identifying useful features (linearly transformed or not) for classification. But, have you thought about adding in non-linear transformations (e.g., the information bottleneck method) that might result in better classification by preserving useful information from categories of data that might be eliminated by elbow or other feature selection procedures? Information bottleneck is particularly attractive for this as it retains all information about the data present in the input parameters.

  7. Feb 2025
    1. We flowed cells through a square quartz capil-lary (outer width and height 600 µm, channel width andheight 240 µm)

      Given the high light intensity and the duration of the exposure of each cell, do you have a sense of whether the spectrum generated when the cell first enters the light beam differs from the spectrum at the end? You might be able to find this by removing the temporal integration and averaging the spectra of different cells across the integration window.

    2. As an application demonstration, we analyzed the lipid con-tent of the human HepG2 hepatocyte cell line followingtreatment with 0.2 mM free fatty acid (FFA). Cells werecultured under two conditions. These consisted of a controlsample, and a sample with FFA in the growth media usingbovine serum albumin (BSA) as a carrier. More than 1800events for each condition were analyzed by our flow cytom-eter, with the average spectrum from each condition shownin Figure 4A. An increased lipid signal at 2862 cm-1 can beobserved in the FFA sample. To distinguish this fatty acidaccumulation at the single event level, we applied principalcomponent analysis-linear discriminant analysis (PCA-LDA). PCA was first applied and the first principal compo-nents (PCs) contributing 95% were selected. Then we traineda linear discrimination analysis (LDA) model on the datasets reconstructed by the selected PCs. 5/6 of the data wasused as a randomly selected training set and model accuracywas validated against the remaining 1/6 of data, yielding aclassification accuracy of 89%. We show the plot of LDAscores in Figure 4B. We next used our flow cytometer to dis-tinguish granulated and degranulated murine mast cells.Mast cells are granulocytes which play a role in inflamma-tory immune system by releasing histamines from intracel-lular storage granules in response to pathogens, parasites, orallergens. Averaged spectra from all events for granulatedand degranulated samples are shown in Figure 4C. PCA-LDA classification distinguished the samples with an accu-racy of 81%. We show the LDA scores in Figure 4D

      Given the high photon flux in the exposure cuvette, do you know how much light damage is occurring? Have you tried to culture the cells after cytometry? Are they viable? If not, do you know whether this impacts the spectra generated by these cells?

    3. hat shaped the beam into a light sheet

      What was the length of the light sheet? I am curious about how the time integrated signal might be impacted by the size of the cells and how long they spend in the light sheet.

  8. Jan 2025
    1. idge logistic regression

      Out of curiosity, is there a reason you chose ridge regression here instead of Lasso or, probably best, elastic net? The regularization term in Ridge can't push parameter weights to zero so one assumption of Ridge regression (similar to OLS) is that all of the predictors matter and it can end up giving weight to predictors that are irrelevant t the task.

      For expression data that expression levels of many genes will be irrelevant to the task (e.g. cell type prediction). It also seems likely that there will be large amounts of collinearity in expression level across genes, if you desire even weighting across genes that are collinear then Ridge might be desirable, but, this might be irrelevant to the success of the prediction in which case, picking one of these genes for prediction might be a better avenue.

      Elastic net that combines both a Ridge and a Lasso penalty seems likely to be the best approach because it will strike a balance between setting the weights on parameters irrelevant to the prediction task to 0 while evenly distributing weights across collinear, but relevant genes.

    2. To regularizethe models, we used early stopping and gradient clipping during the training process

      Did you compare this approach to L1 (Lasso), L2 (Ridge), or both (Elastic net) regularization of the parameter weights?

      It seems likely that one of the issues with these neural networks is that they contain many more parameters than your linear model and it might make sense to take a more aggressive approach to regularizing the weights.

    3. selected as a simple linear baseline tocompare the non-linear models against

      It would be great to know how the weight on the Ridge penalty was determined. Was this grid search or some other approach?

  9. Dec 2024
    1. Raman imaging was achieved by diffusely illuminating the sample with specificwavelengths and filtering the scattered light with narrow band-pass filters

      Have you tried using a tunable filter or a continuous gradient filter to increase the spectral resolution possible with this method?

    2. highlighting that the enhancement in chemical contrast at longer illumination effectivelyovercomes the contribution of autofluorescence even within non-specific broad imaging bands

      Related to the subsequent sentence, what do longer integration times look like for the lower wavelengths?

    3. However, longer wavelengths required slower imaging frame rates (Table S1), emphasizing theneed to balance contrast and acquisition speed according to specific application requirements.

      It would be great to have this tradeoff presented graphically so that we could have an intuition about how to balance integration time and Raman contrast.

  10. Nov 2024
    1. between changes in Hlf-tdTomato expression and temporal differentiation in HSCs was confirmed (Fig. 2E,157Fig. S3, F and G). To determine whether Hlf-tdTomato expression levels act as a functional indicator of158HSCs, tdTomato high and low expanded HSCs were transplanted into irradiated recipients, and long-term159bone marrow reconstitution abilities were evaluated. HSCs with high tdTomato demonstrate robust long-160term marrow reconstitution capabilities. Conversely, chimerism decreased in those who received161tdTomato-low HSCs (Fig. 2, F to H). Furthermore, the analysis of the spatiotemporal dynamics of early162hematopoiesis post-transplantation revealed that tdTomato high HSCs gradually reconstituted systemic163hematopoiesis, whereas tdTomato low HSCs induced rapid hematopoiesis between day 14 and 21 (Fig.1642, I to L).

      This is a really nice demonstration that the levels of Hlf-tdTomato are indicators of 'stemness' but it would also be nice to see a confirmation that the fusion protein is expressed at levels (or not) that are similar to Hlf. Either would be fine given it is a clear indicator of 'stemness' but the molecular correlation would provide clearer ties to the mechanism of 'stemness.'

  11. Oct 2024
    1. Discussion

      You have chosen to validate this approach against t-SNE and PCA, both of which, as you point out, they consider loci independently and shouldn't capture linkage or higher order relationships between alleles during the compression. However, the Autoencoder frameworks you have mentioned should capture these relationships. Have you directly compared your approach to an autoencoder framework using the same metrics you use for comparison to PCA and t-SNE?

    2. convolution

      It seems totally reasonable to reduce the complexiy and parameter number in these early layers using convolution since you expect correlated structure between proximal polymorphisms (linkage). However, unlike images where the immediate physical proximity (number of pixels away you are) is proportional to the shared information, genetic distance is usually not directly proportional to physical distance and can vary dramatically across the genome. However, for many speceis, we know the genetic map and thus the relationship between genetic distance and physical distance. So, I wonder if you have considered an architecture (such as a graph neural network) that could capture the genetic distance and create convolutions based on this ?

  12. Aug 2024
    1. This cannotbe achieved with architectures such as Transformer [25],

      First let me say that I really like your pre-print. It takes good understanding of the basic biology and leverages the best thing about neural network based models: the ability to customize your modeling architecture to take advantage of expert knowledge about a particular inferential problem. In this case you have chosen LSTM because you know that, typically, the influence of one gene on another occurs in temporal order. As you suggest in this sentence, that ordering isn't something that a Transformer architecture takes advantage of, and explicitly so. However, I would suggest that a Transformer architecture would be extremely useful in the analysis of time varying expression data for the following reason. While you are looking for 1:1 connections (networks) here and analyzing your data in a temporal order makes sense for that task, the genes, their expression, their protein expression are all parts of dynamical system. And, following a discrete experimental change you are likely to cause a long running cascade that could cause long range down stream defects in expression. As a result, if you are going to predict gene expression at time t0, it seems entirely reasonable that information at time t10 or t20 may be relevant. Do you think this is so? Could transformers be usefully applied to expression data of this variety?

    2. and then the regressionof target gene expression cannot be sufficiently learned

      I find this clause a little hard to parse. I presume this is suggesting that the regression is a linear regression, and that when you try to fit a linear regression to data that are determined by a non-linear system you can't accurately fit the data, is that right?

  13. Jun 2024
    1. Photo-activation of a diffraction-limited spot in a CAD cell expressing PaGFPactin shows asymmetricmovement toward the front of the cell.

      This text appears to be the same as that describing panel a. Is this intended? I assume C is an image of a blebbistatin experiment?

    2. The EGFP-actin network of NG108 cells was rapidly bleached between 0.3 and 1.3s. At3.9s, bleached actin monomer from the network has been transported (recycled) to the front ofthe cell, repolymerized at the leading edge, and traveled rearward (thin dark line indicated byarrow).

      Out of curiosity (and ignorance) why is the line containing the repolymerized bleached monomer so thin? The volume of bleached monomer appears to be large. Is the width of the repolymerized line impacted by the relative position of the bleaching?

  14. Feb 2024
    1. We measured Raman spectra of single cells

      It would be nice to have a more expansive description of how Raman spectra (optical layout of apparatus, single cell capture, etc...) were collected.

    2. Here we show that proteome profiles

      This is an extremely compelling result and you provide significant evidence that Raman spectra and proteomes can be related. Such a result has extremely compelling implications for the possible uses of Raman spectroscopy for predicting proteome profiles. Here you work with proteomic data from another group and collect raman spectra from single bacterial cells grown in conditions that are as close as possible to the original conditions. It would be even more compelling if you could do this analysis, capture the Raman spectra, then validate (for at least some growth conditions) the proteomic profile matches that which was previously published.

  15. Dec 2023
    1. l of 15,000 images were generated for each condition. Each image was assigned a unique random seed,

      It seems likely this approach will require more data than a similar approach using pre-registered images. Do you have a sense of how much more data would be required for similar accuracy?

    1. Humans have adapted to a much longer lifespan compared with other primate species, which have a median lifespan of 20-30 years, suggesting that increased selection on TERT may have occurred as part of human adaption towards extended longevity

      Is it clear what the relative telomere lengths are in these species and whether they are correlated with selection on TERT?

    2. The remaining 19 variants appear to be truly pathogenic in human, and are presumably tolerated in primate because of primate-human differences, such as interactions with changes in the neighboring sequence context (45, 46).

      Have you considered the possibility that these deleterious influences in humans might be modified by differences between humans and primates that act at a larger range. For example, a change in a specific transcription factor binding site might have a compensatory shift in the transcription factor. These types of interactions could act at long range and may underlie some of these species-species differences in deleterious effect.

    3. However, deleterious variants were incompletely removed in humans, consistent with the shorter amount of time they were exposed to natural selection.

      It would be great to hear alternative explanations for the presence of 'deleterious' variants in humans. For example, it might be true that the deleterious effect of a variant may be different across species.

  16. Oct 2023
    1. Live cell imaging of the fission yeast Schizosaccharomyces pombe at 3, 11, 22, 27, 30, and 34 mins in (a-f) (See Visualization I). This strain expresses the nuclear pore protein nup211 fused with the green fluorescent protein (nup211-GFP), marking the position of the nucleus. Arrowheads point to the septum, where cytokinesis occurs. Scale bar: 5µm.

      It would be great to have a comparison to a standard DIC image here. It would help to answer questions like, is the image of the cleavage furrow substantially clearer in the PDIC image.

    2. The phase images for the stained and unstained serial cuts are overall similar. However, slight elevation of the phase is noticeable for the H&E stained section, especially in the areas surrounding some stromal regions.

      The ability to image structural components of unstained material is very impressive. You have shown two examples of biological samples that are very thin. Have you looked at thicker samples? is there a limit to the thickness of samples that can be imaged?

  17. Sep 2023
    1. he square in the top1164right-hand corner is created by samples 550-648, which have distinct genotypes to the rest of1165the samples due to their having been bred from different F1 parents

      How did you account for different parental lines in your analysis?

    2. In short, 1.25681uL of each sample was taken into a tagmentation reaction containing 1.25 uL of682Dimethylformamide, 1.25 uL of tagmentation buffer (40 mM Tris-HCl pH 7.5,68340 mM MgCl2) and 1.25 uL of an in-house generated and purified Tn556 an

      This is an extremely clever way of reducing the overall cost of marker typing in the F2 generation, but there are alternatives. Have you considered something like RADseq or other reduced representation libraries? It might enable analysis of cohorts larger than 600 animals.

  18. Aug 2023
    1. herefore, we hypothesize that the association of Dop2R with variation in female productivity in the DGRP may at least partially arise from naturally occurring variation in expression of Dop2R in DGRP females, which then causes variation in the amount of DA that regulates fertility via changes in JH and 20E titers. Dop2R expression is genetically variable in the DGRP, with a broad sense heritability in females ∼ 0.70 [16]. We found a strong correlation between Dop2R expression in females with productivity (r = 0.34, P = 9.81e-7, Figure 4b) but a relatively weak effect of the Dop2R SNP on expression (P = 0.09). Mediation analysis revealed that approximately 19% (P = 0.09) of the effect of the Dop2R SNP on productivity was mediated by the effect of Dop2R expression.

      I believe there is a Dop2R mutant available. Have you looked at these animals? they might prove useful in testing some mechanistic hypotheses.

    2. The most significant SNP (P = 3.42 × 10−7) is 1,612 bp downstream of the gene encoding the dopamine 2-like receptor (Dop2R), and explained approximately 47% of the genetic variance of productivity in females.

      Wow! This is an amazing result!

    3. Therefore, we adjusted productivity for the Wolbachia infection status of parents before further analyses.

      Did you examine whether there was an interaction between Wolbachia infection and any of the alleles you mapped?

  19. Apr 2023
    1. The resulting top terms after this trimming define the latent space.

      What is the expected distribution of weights in the latent space? Would a discriminator network to impose different distributions be useful here?

    2. o verify the validity of these predictions, we performed a gene-set enrichment analysis (GSEA) using as a ground truth the differentially expressed genes in a recently published dataset of bulk RNA-seq carried out on muscle samples from LGMD patients (n=16) and healthy individuals (n=15)25, where we had determined the genes that were significantly up- (LGMD_up) or downregulated (LGMD_dn) in patients compared to age-matched controls (Supplementary Table 3).

      The significance of the difference in gene expression will be related to the size of the effect on the expression, but many genes that influence a phenotype may only show small changes in expression level. How well does this model deal with genes that show small changes in expression? Would this miss genes that show small changes in expression but are nevertheless important?

  20. Feb 2023
    1. expressive

      It would be great to have a clear definition of 'expressive' here. It can be inferred from the results, but, given 'expressiveness' is one of the differentiating features of the IAE, it would be nice to have a statement here as to what is intended.

  21. Dec 2022
    1. o verify the validity of these predictions, we performed a gene-set enrichment analysis (GSEA) using as a ground truth the differentially expressed genes in a recently published dataset of bulk RNA-seq carried out on muscle samples from LGMD patients (n=16) and healthy individuals (n=15)25, where we had determined the genes that were significantly up- (LGMD_up) or downregulated (LGMD_dn) in patients compared to age-matched controls (Supplementary Table 3).

      The significance of the difference in gene expression will be related to the size of the effect on the expression, but many genes that influence a phenotype may only show small changes in expression level. How well does this model deal with genes that show small changes in expression? Would this miss genes that show small changes in expression but are nevertheless important?

    2. The resulting top terms after this trimming define the latent space.

      What is the expectation for the distribution of weights in the latent space? would it be useful to use a discriminator network to structure the weight distribution in this layer?