66 Matching Annotations
  1. Jan 2020
    1. An outlier is a data point whose response y does not follow the general trend of the rest of the data. A data point has high leverage if it has "extreme" predictor x values. With a single predictor, an extreme x value is simply one that is particularly high or low. With multiple predictors, extreme x values may be particularly high or low for one or more predictors, or may be "unusual" combinations of predictor values (e.g., with two predictors that are positively correlated, an unusual combination of predictor values might be a high value of one predictor paired with a low value of the other predictor).
  2. Dec 2019
    1. So what's our total time cost? O(nlog⁡2n)O(n\log_{2}{n})O(nlog2​n). The log⁡2n\log_{2}{n}log2​n comes from the number of times we have to cut nnn in half to get down to sublists of just 1 element (our base case). The additional nnn comes from the time cost of merging all nnn items together each time we merge two sorted sublists.
  3. Aug 2019
  4. Jul 2019
    1. In practice, we found that it is not appropriate to use Aalen’s additive hazardsmodel for all datasets, because when we estimate cumulativeregression functionsB(t),they are restricted to the time interval where X (X has been defined in Chapter 3) is offull rank, that meansX0Xis invertible. Sometimes we found that X is not of full rank,which was not a problem with the Cox model.
    2. The effect ofthe covariates on survival is to act multiplicatively on some unknown baseline hazardrate, which makes it difficult to model covariate effects that change over time. Secondly,if covariates are deleted from a model or measured with a different level of precision, theproportional hazards assumption is no longer valid. These weaknesses in the Cox modelhave generated interest in alternative models. One such alternative model is Aalen’s(1989) additive model. This model assumes that covariates act in an additive manneron an unknown baseline hazard rate. The unknown risk coefficients are allowed to befunctions of time, so that the effect of a covariate may vary over time.
    1. Note that, the confidence limits are wide at the tail of the curves, making meaningful interpretations difficult. This can be explained by the fact that, in practice, there are usually patients who are lost to follow-up or alive at the end of follow-up. Thus, it may be sensible to shorten plots before the end of follow-up on the x-axis (Pocock et al, 2002).
    1. in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance

      Use standardization, not min-max scaling, for clustering and PCA.

    1. Given a dataset of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items. Different from Apriori-like algorithms designed for the same purpose, the second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets explicitly, which are usually expensive to generate. After the second step, the frequent itemsets can be extracted from the FP-tree.
    1. Feature predictive power will be calculated for all features contained in a dataset along with the outcome feature. Works for binary classification, multi-class classification and regression problems. Can also be used when exploring a feature of interest to determine correlations of independent features with the outcome feature. When the outcome feature is continuous of nature or is a regression problem, correlation calculations are performed. When the outcome feature is categorical of nature or is a classification problem, the Kolmogorov Smirnov distance measure is used to determine predictive power. For multi-class classification outcomes, a one vs all approach is taken which is then averaged to arrive at the mean KS distance measure. The predictive power is sensitive towards the manner in which the data has been prepared and will differ should the manner in which the data has been prepared changes.
    1. Mutual information is one of many quantities that measures how much one random variables tells us about another. It is a dimensionless quantity with (generally) units of bits, and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables means the variables are independent.
    1. The Freedman-Diaconis rule is very robust and works well in practice. The bin-width is set to h=2×IQR×n−1/3h=2×IQR×n−1/3h=2\times\text{IQR}\times n^{-1/3}. So the number of bins is (max−min)/h(max−min)/h(\max-\min)/h, where nnn is the number of observations, max is the maximum value and min is the minimum value.

      How to determine the number of bins to use in a histogram.

    1. many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
  5. Jun 2019
  6. varsellcm.r-forge.r-project.org varsellcm.r-forge.r-project.org

    Tags

    Annotators

    1. The cos2 values are used to estimate the quality of the representation The closer a variable is to the circle of correlations, the better its representation on the factor map (and the more important it is to interpret these components) Variables that are closed to the center of the plot are less important for the first components.
    1. Thus, when we say that PCA can reduce dimen-sionality, we mean that PCA can compute princi-pal components and the user can choose the smallestnumberKof them that explain 0.95 of the variance.A subjectively satisfactory result would be whenKis small relative to the original number of featuresD.
  7. Jan 2019