72 Matching Annotations
  1. Jul 2023
    1. The bias arose because the respondent's height is a collider variable—a direct product of another covariate (SNPs of height) and an outcome (sex).

      excellent example

  2. Jul 2022
    1. this approach can bias the sample reducing the generalizability of the model, and runs the risk of misestimating the effect of interest

      outlier exclusion

    2. This definition of R2 is the equivalent to the squared correlation between the predicted and observed values and reflects the error between the predicted values and its fit to the regression line, not the error between the predicted and observed values

      R2 = r^2 for linear regression within a single sample. R2 is different and can be negative when evaluated out of sample.

    3. increasing K will increase variance—the sensitivity of the model to changes caused by different training data—as the predictive model has less data for training in each sample selection

      increasing K decreases (K-1)/K, what?

  3. May 2022
    1. Although we used the 10 folds cross validation to tune parameters and verified the generalization ability of the models in the independent test-set, it may not completely represent the characteristics of different samples.

      no nested cross validation

  4. Apr 2022
    1. Low reliability of either Time 1 or Time 2 score lowers the reliability of the change or residual score, whereas a high correlation between Time 1 and Time 2 scores causes lower reliability of difference scores

      Burt and Obradović (2013)

      Consider using DS rather than RS if there is a strong association between baseline assessments and the outcome

  5. Mar 2022
    1. There are few studies to date using these technologies, and only in adult populations (Place et al., 2017). As such, validation of these models and platforms in large studies are needed.

      check in 2022

    1. It is now well accepted that low-level chronic exposure to environmental chemicals may contribute to the growing epidemic of childhood neurodevelopmental disorders worldwide (Grandjean and Landrigan, 2014).

      controversial review...

    1. What kind of parameter space are we operating in? That is, is the language/personality space sparse, so that a relatively small number of language variables account for the bulk of the explainable effects of personality on language? Or is it dense, with hundreds, or even thousands, of distinct but very small contributions? And can the space be adequately modeled by considering just the (additive) main effects of the predictors, or must one consider potentially very complex higher order interactions in order to generate adequate predictions?

      data exploration, model selection

  6. Aug 2021
    1. An important result that we will draw upon here is that if the causal diagram is such that all common causes of any two variables on the graph are also on the graph then Pearl’s backdoor path theorem applies

      how to be sure graph has all causes?

  7. Jul 2021
    1. This effect occurs when we adjust two independent variables for a potential confound that was actually a consequence of the independent variables. In this case, a spurious association between the independent variables can be incorrectly induced.

      Berkson's paradox

    2. Separation by site, demedianing and outlier removal of quantitative confounds
      1. normalize (median, median absolute deviation)
      2. separate copy for each site (fill empty with 0)
      3. replace missing/outlier (|z| > 8 where s = 1.48 * MAD) with median
      4. unit normalize
  8. Jun 2021
    1. As the ABCD study continues, it will be critical to test whether the separation between males and females changes with age and pubertal status.

      TODO

    2. strong effect of scanner that was most notable in posterior brain regions, as well as in the anterior temporal lobe and orbitofrontal cortex

      visual, default

    3. greater anticorrelation between the DMN and DAN was associated with higher general cognitive ability

      replicates task-positive/negative anticorrelation

    4. the first principal component of between-participant effects had high loadings on connections within the visual and default mode networks and those between default mode and dorsal attention networks

      Seems more like {default, visual} ~ {dorsal, ventral} attention

    5. regressed the NIH toolbox total scores onto RSFC for each ROI pair, while covarying scanner manufacturer, which had a more appreciable effect than data collection site, and sex, each coded as categorical varialbes

      variables

  9. May 2021
    1. ICA is then applied to the aggregated canonical mode expressions to recover the independent sources of the variation between observations expressed in the embedding space. While incurring additional computational load, this approach can be advantageous because CCA can only disentangle latent directions of variation in the data up to a random rotation

      Miller et al, 2016 did ICA on variable-weights (p+q, m) not subject-weights (2N, m)

    1. we extracted and combined the behavioral and MRI CCA scores for the three significant variates, correlated these with the original data matrix, transformed the correlations using a Fisher Z-transform, and submitted these to ICA

      Incorrect. Miller et al. did ICA on variable weights (loadings). This ICA is on subject weights (scores).

    1. They are also useful for detecting new trait associations by correlating observed phenotypes in a sample or cohort with the genetic prediction of another trait. This design is powerful, because if the discovery sample is fully independent of the new sample, an observed association between a complex trait and a genetic predictor from the discovery sample must be due to genetic factors, given that there are no shared environmental factors.

      PRS ~ new traits

  10. Apr 2021
    1. required sample626sizes for sparse CCA were still many times the number of627features: whenrtrue= 0.3, for example, 35–50 (depending on628the number of features) samples per feature were required

      using datasets generated for CCA?

    2. PLS has ben compared to sparse CCA in a setting with more632features than samples and it has been concluded that the for-633mer (latter) performs better when having fewer (more) than634about 500 features per sample (54).

      backward (Grellmann et al., 2015)

    1. Projecting the gene expression and functional association matrices back onto the gene weights and term weights, respectively, reflects how well a brain area exhibits the gene and term pattern, which we refer to as gene scores and term scores

      (# node,) vectors Xu, Yv

  11. Mar 2021
    1. The loading of each term was computed as the Pearson’s correlation between the term’s functional association across brain regions and the PLS analysis-estimated scores

      corr with gene scores only: corr(X, Xu) and corr(Y, Xu)

    2. Gene scores for 12 unique brain regions gradually increase with development and peak in adulthood

      Perceptual (negative) regions are more positive?

      Differentiation of regions is more interesting, not overall increase.

    3. To ensure that the correlation between gene and term scores is not inflated due to spatial autocorrelation, we selected the 75% of brain regions closest in Euclidean distance to a randomly chosen source node as the training set, and the remaining 25% of brain regions as the testing set.

      citation for this cross-validation method?

  12. Feb 2021
    1. we normalize voxel time series by dividing by the mean across time of each voxel and then use linear regression to remove quadratic trends, signals correlated with estimated motion time courses, and the mean time courses of cerebral white matter, ventricles, and whole brain, as well as their first derivatives

      36 regressors

  13. Dec 2020
    1. Interestingly, despite353the fact that these studies investigated different questions354using different datasets and modalities, the reported canonical355correlation could be well predicted simply by the number of356samples per feature alone (R2= 0.83).

      Interest is in what variables are in the canonical variate, not the correlation.

      Also consider publication bias.

  14. Sep 2020
    1. During adolescence, functional connectivity between resting-state networks decreases with age, whereas functional connectivity within cortical resting-state networks increases with age, except for several connections within the salience network that decrease with age. There is limited evidence for dynamics in genetic or common environmental influences, suggesting mostly stable influences across adolescence.

      but interpret dynamics analysis with caution (limited power)

    1. individualized functional topography accurately predicted executive function in matched split-half samples while controlling for age, sex, and motion

      Is r = 0.42 accurate?

  15. Jun 2020