66 Matching Annotations

Jan 2020
www.sthda.com www.sthda.com

Logistic Regression Assumptions and Diagnostics in R - Articles - STHDA

1
1. intelligence.refinery 29 Jan 2020
  
  in Public
  
  Logistic regression assumptions
  
  Logistic regression
Visit annotations in context

Tags

Logistic regression

Annotators

intelligence.refinery

URL

sthda.com/english/articles/36-classification-methods-essentials/148-logistic-regression-assumptions-and-diagnostics-in-r/
www.sthda.com www.sthda.com

Logistic Regression Essentials in R - Articles - STHDA

4
1. intelligence.refinery 29 Jan 2020
  
  in Public
  
  Make sure that the predictor variables are normally distributed. If not, you can use log, root, Box-Cox transformation.
  
  Logistic regression
2. intelligence.refinery 28 Jan 2020
  
  in Public
  
  You can also fit generalized additive models (Chapter @ref(polynomial-and-spline-regression)), when linearity of the predictor cannot be assumed. This can be done using the mgcv package:
  
  Generalized additive models
3. intelligence.refinery 28 Jan 2020
  
  in Public
  
  For a given predictor (say x1), the associated beta coefficient (b1) in the logistic regression function corresponds to the log of the odds ratio for that predictor.
  
  Log odds
4. intelligence.refinery 28 Jan 2020
  
  in Public
  
  If the odds ratio is 2, then the odds that the event occurs (event = 1) are two times higher when the predictor x is present (x = 1) versus x is absent (x = 0).
  
  Log odds
Visit annotations in context

Tags

Log odds

Logistic regression

Generalized additive models

Annotators

intelligence.refinery

URL

sthda.com/english/articles/36-classification-methods-essentials/151-logistic-regression-essentials-in-r/
online.stat.psu.edu online.stat.psu.edu

9.1 - Distinction Between Outliers and High Leverage Observations | STAT 462

1
1. intelligence.refinery 22 Jan 2020
  
  in Public
  
  An outlier is a data point whose response y does not follow the general trend of the rest of the data. A data point has high leverage if it has "extreme" predictor x values. With a single predictor, an extreme x value is simply one that is particularly high or low. With multiple predictors, extreme x values may be particularly high or low for one or more predictors, or may be "unusual" combinations of predictor values (e.g., with two predictors that are positively correlated, an unusual combination of predictor values might be a high value of one predictor paired with a low value of the other predictor).
  
  regression
Visit annotations in context

Tags

regression

Annotators

intelligence.refinery

URL

online.stat.psu.edu/stat462/node/170/
jbhender.github.io jbhender.github.io

Weighted Regression in SAS, R, and Python

1
1. intelligence.refinery 21 Jan 2020
  
  in Public
  
  As shown in the Residuals vs Fitted plot, there is a megaphone shape, which indicates that non-constant variance is likely to be an issue.
  
  Regression
Visit annotations in context

Tags

Regression

Annotators

intelligence.refinery

URL

jbhender.github.io/Stats506/F17/Projects/Abalone_WLS.html
scotch.io scotch.io

Build a To-Do application Using Django and React

1
1. intelligence.refinery 06 Jan 2020
  
  in Public
  
  Add this code snippet to the bottom of the backend/settings.py file:
  
  Need to add more ports: https://github.com/adamchainz/django-cors-headers/issues/403
Visit annotations in context

Annotators

intelligence.refinery

URL

scotch.io/tutorials/build-a-to-do-application-using-django-and-react
Dec 2019
www.interviewcake.com www.interviewcake.com

Logarithms for algorithmic coding interviews | Interview Cake

1
1. intelligence.refinery 14 Dec 2019
  
  in Public
  
  So what's our total time cost? O(nlog⁡2n)O(n\log_{2}{n})O(nlog2n). The log⁡2n\log_{2}{n}log2n comes from the number of times we have to cut nnn in half to get down to sublists of just 1 element (our base case). The additional nnn comes from the time cost of merging all nnn items together each time we merge two sorted sublists.
  
  Algorithms Quick sort
Visit annotations in context

Tags

Quick sort

Algorithms

Annotators

intelligence.refinery

URL

interviewcake.com/article/java/logarithms
Aug 2019
www.pysurvival.io www.pysurvival.io

Introduction to Survival Analysis - PySurvival

1
1. intelligence.refinery 01 Aug 2019
  
  in Public
  
  so that instead of predicting the time of event, we are predicting the probability that an event happens at a particular time .
  
  Survival analysis
Visit annotations in context

Tags

Survival analysis

Annotators

intelligence.refinery

URL

pysurvival.io/intro.html
Jul 2019
archimede.mat.ulaval.ca archimede.mat.ulaval.ca

C:/caoThesis/GabaritLatex/memoire.dvi

3
1. intelligence.refinery 30 Jul 2019
  
  in Public
  
  In practice, we found that it is not appropriate to use Aalen’s additive hazardsmodel for all datasets, because when we estimate cumulativeregression functionsB(t),they are restricted to the time interval where X (X has been defined in Chapter 3) is offull rank, that meansX0Xis invertible. Sometimes we found that X is not of full rank,which was not a problem with the Cox model.
  
  Survival analysis
2. intelligence.refinery 30 Jul 2019
  
  in Public
  
  An overall conclusion is that the two models give different pieces of informationand should not be viewed as alternatives to each other, but ascomplementary methodsthat may be used together to give a fuller and more comprehensive understanding ofdata
  
  Survival analysis
3. intelligence.refinery 11 Jul 2019
  
  in Public
  
  The effect ofthe covariates on survival is to act multiplicatively on some unknown baseline hazardrate, which makes it difficult to model covariate effects that change over time. Secondly,if covariates are deleted from a model or measured with a different level of precision, theproportional hazards assumption is no longer valid. These weaknesses in the Cox modelhave generated interest in alternative models. One such alternative model is Aalen’s(1989) additive model. This model assumes that covariates act in an additive manneron an unknown baseline hazard rate. The unknown risk coefficients are allowed to befunctions of time, so that the effect of a covariate may vary over time.
  
  Aalen additive model Survival analysis
Visit annotations in context

Tags

Survival analysis

Aalen additive model

Annotators

intelligence.refinery

URL

archimede.mat.ulaval.ca/theses/H-Cao_05.pdf
www.sthda.com www.sthda.com

Survival Analysis Basics - Easy Guides - Wiki - STHDA

2
1. intelligence.refinery 29 Jul 2019
  
  in Public
  
  Note that, three often used transformations can be specified using the argument fun: “log”: log transformation of the survivor function, “event”: plots cumulative events (f(y) = 1-y). It’s also known as the cumulative incidence, “cumhaz” plots the cumulative hazard function (f(y) = -log(y))
  
  Survival analysis Survival function
2. intelligence.refinery 08 Jul 2019
  
  in Public
  
  Note that, the confidence limits are wide at the tail of the curves, making meaningful interpretations difficult. This can be explained by the fact that, in practice, there are usually patients who are lost to follow-up or alive at the end of follow-up. Thus, it may be sensible to shorten plots before the end of follow-up on the x-axis (Pocock et al, 2002).
  
  Survival analysis Kaplan-Meier curve
Visit annotations in context

Tags

Survival analysis

Kaplan-Meier curve

Survival function

Annotators

intelligence.refinery

URL

sthda.com/english/wiki/survival-analysis-basics
fs.blog fs.blog

Farnam Street Principles

1
1. intelligence.refinery 21 Jul 2019
  
  in Public
  
  1. Direction Over Speed
  
  Intelligence Refinery Principles for living
Visit annotations in context

Tags

Intelligence Refinery

Principles for living

Annotators

intelligence.refinery

URL

fs.blog/principles/
www.sthda.com www.sthda.com

MFA - Multiple Factor Analysis in R: Essentials - Articles - STHDA

1
1. intelligence.refinery 18 Jul 2019
  
  in Public
  
  summarizing and visualizing a complex data table in which individuals are described by several sets of variables (quantitative and /or qualitative) structured into groups.
  
  Multiple factorial analysis
Visit annotations in context

Tags

Multiple factorial analysis

Annotators

intelligence.refinery

URL

sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/116-mfa-multiple-factor-analysis-in-r-essentials/
www.sthda.com www.sthda.com

FAMD - Factor Analysis of Mixed Data in R: Essentials - Articles - STHDA

1
1. intelligence.refinery 18 Jul 2019
  
  in Public
  
  it acts as PCA quantitative variables and as MCA for qualitative variables.
  
  Factor analysis of mixed data
Visit annotations in context

Tags

Factor analysis of mixed data

Annotators

intelligence.refinery

URL

sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/115-famd-factor-analysis-of-mixed-data-in-r-essentials/
factominer.free.fr factominer.free.fr

Multiple Factor Analysis

2
1. intelligence.refinery 18 Jul 2019
  
  in Public
  
  MFA is a weighted PCA
  
  Multiple factorial analysis
2. intelligence.refinery 18 Jul 2019
  
  in Public
  
  Study the similarity between individuals with respect to thewhole set of variables AND the relationships between variables
  
  Multiple factorial analysis
Visit annotations in context

Tags

Multiple factorial analysis

Annotators

intelligence.refinery

URL

factominer.free.fr/course/doc/MFA_course_slides.pdf
sebastianraschka.com sebastianraschka.com

About Feature Scaling and Normalization

2
1. intelligence.refinery 17 Jul 2019
  
  in Public
  
  in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance
  
  Use standardization, not min-max scaling, for clustering and PCA.
  
  Clustering PCA Data normalization
2. intelligence.refinery 05 Jul 2019
  
  in Public
  
  As a rule of thumb I’d say: When in doubt, just standardize the data, it shouldn’t hurt.
  
  Data normalization
Visit annotations in context

Tags

Data normalization

Clustering

PCA

Annotators

intelligence.refinery

URL

sebastianraschka.com/Articles/2014_about_feature_scaling.html
www.csd.uwo.ca www.csd.uwo.ca

cs4412slides

2
1. intelligence.refinery 16 Jul 2019
  
  in Public
  
  Implication means co-occurrence, not causality!
  
  Association rule learning
2. intelligence.refinery 16 Jul 2019
  
  in Public
  
  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other itemsin the transaction
  
  Association rule learning
Visit annotations in context

Tags

Association rule learning

Annotators

intelligence.refinery

URL

csd.uwo.ca/Courses/CS4412a/Notes/AssociationRule.pdf
www.cs.ubc.ca www.cs.ubc.ca

L11.pdf

2
1. intelligence.refinery 15 Jul 2019
  
  in Public
  
  Cluster features and only consider rules within clusters
  
  Association rule learning
2. intelligence.refinery 15 Jul 2019
  
  in Public
  
  Support Set Pruning
  
  Association rule learning
Visit annotations in context

Tags

Association rule learning

Annotators

intelligence.refinery

URL

cs.ubc.ca/~schmidtm/Courses/340-F15/L11.pdf
spark.apache.org spark.apache.org

Frequent Pattern Mining - Spark 2.4.3 Documentation

1
1. intelligence.refinery 15 Jul 2019
  
  in Public
  
  Given a dataset of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items. Different from Apriori-like algorithms designed for the same purpose, the second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets explicitly, which are usually expensive to generate. After the second step, the frequent itemsets can be extracted from the FP-tree.
  
  FP-growth
Visit annotations in context

Tags

FP-growth

Annotators

intelligence.refinery

URL

spark.apache.org/docs/latest/ml-frequent-pattern-mining.html
livebook.datascienceheroes.com livebook.datascienceheroes.com

Data Science Live Book

1
1. intelligence.refinery 12 Jul 2019
  
  in Public
  
  2.1.8 Automatic data frame discretization
  
  Discretize continuous variables
Visit annotations in context

Tags

Discretize continuous variables

Annotators

intelligence.refinery

URL

livebook.datascienceheroes.com/data-preparation.html
lifelines.readthedocs.io lifelines.readthedocs.io

Survival regression — lifelines 0.22.0 documentation

2
1. intelligence.refinery 10 Jul 2019
  
  in Public
  
  Non-proportional hazards is a case of model misspecification.
  
  Cox proportional hazards regression model Proportional hazard assumption
2. intelligence.refinery 10 Jul 2019
  
  in Public
  
  The idea behind the model is that the log-hazard of an individual is a linear function of their static covariates and a population-level baseline hazard that changes over time.
  
  Cox proportional hazards regression model
Visit annotations in context

Tags

Cox proportional hazards regression model

Proportional hazard assumption

Annotators

intelligence.refinery

URL

lifelines.readthedocs.io/en/latest/Survival Regression.html
livebook.datascienceheroes.com livebook.datascienceheroes.com

Data Science Live Book

2
1. intelligence.refinery 10 Jul 2019
  
  in Public
  
  However, the gain ratio is the most important metric here, ranged from 0 to 1, with higher being better.
  
  Variable importance Gain ratio
2. intelligence.refinery 09 Jul 2019
  
  in Public
  
  en: entropy measured in bits mi: mutual information ig: information gain gr: gain ratio
  
  Variable importance
Visit annotations in context

Tags

Variable importance

Gain ratio

Annotators

intelligence.refinery

URL

livebook.datascienceheroes.com/selecting-best-variables.html
www.sthda.com www.sthda.com

Visualizing Multivariate Categorical Data - Articles - STHDA

1
1. intelligence.refinery 09 Jul 2019
  
  in Public
  
  Balloon plot
  
  Balloon plot
  
  Data visualization
Visit annotations in context

Tags

Data visualization

Annotators

intelligence.refinery

URL

sthda.com/english/articles/32-r-graphics-essentials/129-visualizing-multivariate-categorical-data/
rdrr.io rdrr.io

predictivePower: Calcualtes feature predictive power in XanderHorn/autoEDA: Automated univariate and bivariate exploratory data analysis

1
1. intelligence.refinery 09 Jul 2019
  
  in Public
  
  Feature predictive power will be calculated for all features contained in a dataset along with the outcome feature. Works for binary classification, multi-class classification and regression problems. Can also be used when exploring a feature of interest to determine correlations of independent features with the outcome feature. When the outcome feature is continuous of nature or is a regression problem, correlation calculations are performed. When the outcome feature is categorical of nature or is a classification problem, the Kolmogorov Smirnov distance measure is used to determine predictive power. For multi-class classification outcomes, a one vs all approach is taken which is then averaged to arrive at the mean KS distance measure. The predictive power is sensitive towards the manner in which the data has been prepared and will differ should the manner in which the data has been prepared changes.
  
  Variable importance autoEDA
Visit annotations in context

Tags

Variable importance

autoEDA

Annotators

intelligence.refinery

URL

rdrr.io/github/XanderHorn/autoEDA/man/predictivePower.html
www.scholarpedia.org www.scholarpedia.org

Mutual information

1
1. intelligence.refinery 09 Jul 2019
  
  in Public
  
  Mutual information is one of many quantities that measures how much one random variables tells us about another. It is a dimensionless quantity with (generally) units of bits, and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables means the variables are independent.
  
  Mutual information Variable importance
Visit annotations in context

Tags

Variable importance

Mutual information

Annotators

intelligence.refinery

URL

scholarpedia.org/article/Mutual_information
nbviewer.jupyter.org nbviewer.jupyter.org

Notebook on nbviewer

1
1. intelligence.refinery 08 Jul 2019
  
  in Public
  
  Sidenote: Visually comparing estimated survival curves in order to assess whether there is a difference in survival between groups is usually not recommended, because it is highly subjective. Statistical tests such as the log-rank test are usually more appropriate.
  
  Kaplan-Meier curve Log-rank test
Visit annotations in context

Tags

Log-rank test

Kaplan-Meier curve

Annotators

intelligence.refinery

URL

nbviewer.jupyter.org/github/sebp/scikit-survival/blob/master/examples/00-introduction.ipynb
pedroconcejero.wordpress.com pedroconcejero.wordpress.com

Survival Random Forests for Churn prediction

1
1. intelligence.refinery 08 Jul 2019
  
  in Public
  
  RF is now a standard to effectively analyze a large number of variables, of many different types, with no previous variable selection process. It is not parametric, and in particular for survival target it does not assume the proportional risks assumption.
  
  Random forest Survival analysis
Visit annotations in context

Tags

Random forest

Survival analysis

Annotators

intelligence.refinery

URL

pedroconcejero.wordpress.com/2015/11/12/survival-random-forests-for-churn-prediction-3/
www.cscu.cornell.edu www.cscu.cornell.edu

An Introduction to Survival Analysis

1
1. intelligence.refinery 08 Jul 2019
  
  in Public
  
  Thesurvival function gives,for every time,the probability of surviving(or not experiencing the event) up to that time.The hazard function gives the potential that the event will occur, per time unit, given that an individual has survived up to the specified time.
  
  Survival function Survival analysis Hazard function
Visit annotations in context

Tags

Survival analysis

Hazard function

Survival function

Annotators

intelligence.refinery

URL

cscu.cornell.edu/news/statnews/stnews78.pdf
www.statisticshowto.datasciencecentral.com www.statisticshowto.datasciencecentral.com

Choose Bin Sizes for Histograms in Easy Steps + Sturge's Rule - Statistics How To

1
1. intelligence.refinery 08 Jul 2019
  
  in Public
  
  Sturge’s rule works best for continuous data that is normally distributed and symmetrical.
  
  Histogram
Visit annotations in context

Tags

Histogram

Annotators

intelligence.refinery

URL

statisticshowto.datasciencecentral.com/choose-bin-sizes-statistics/
towardsdatascience.com towardsdatascience.com

Scale, Standardize, or Normalize with Scikit-Learn – Towards Data Science

1
1. intelligence.refinery 06 Jul 2019
  
  in Public
  
  how the features are all on the same relative scale. The relative spaces between each feature’s values have been maintained.
  
  Data scaling
Visit annotations in context

Tags

Data scaling

Annotators

intelligence.refinery

URL

towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02
www.kaggle.com www.kaggle.com

Data Cleaning Challenge: Scale and Normalize Data | Kaggle

1
1. intelligence.refinery 06 Jul 2019
  
  in Public
  
  "in scaling, you're changing the range of your data while in normalization you're changing the shape of the distribution of your data."
Visit annotations in context

Annotators

intelligence.refinery

URL

kaggle.com/jfeng1023/data-cleaning-challenge-scale-and-normalize-data
stats.stackexchange.com stats.stackexchange.com

Calculating optimal number of bins in a histogram

1
1. intelligence.refinery 06 Jul 2019
  
  in Public
  
  The Freedman-Diaconis rule is very robust and works well in practice. The bin-width is set to h=2×IQR×n−1/3h=2×IQR×n−1/3h=2\times\text{IQR}\times n^{-1/3}. So the number of bins is (max−min)/h(max−min)/h(\max-\min)/h, where nnn is the number of observations, max is the maximum value and min is the minimum value.
  
  How to determine the number of bins to use in a histogram.
  
  Histogram
Visit annotations in context

Tags

Histogram

Annotators

intelligence.refinery

URL

stats.stackexchange.com/questions/798/calculating-optimal-number-of-bins-in-a-histogram
scikit-learn.org scikit-learn.org

4.3. Preprocessing data — scikit-learn 0.19.dev0 documentation

2
1. intelligence.refinery 06 Jul 2019
  
  in Public
  
  Discretization (otherwise known as quantization or binning) provides a way to partition continuous features into discrete values.
  
  Binning
  
  Binning
2. intelligence.refinery 06 Jul 2019
  
  in Public
  
  many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
  
  Data normalization Data standardization
Visit annotations in context

Tags

Data normalization

Data standardization

Binning

Annotators

intelligence.refinery

URL

scikit-learn.org/stable/modules/preprocessing.html
www.willmcginnis.com www.willmcginnis.com

Beyond One-Hot: an exploration of categorical variables - Will's Noise

1
1. intelligence.refinery 05 Jul 2019
  
  in Public
  
  we want to code categorical variables into numbers, but we are concerned about this dimensionality problem
  
  Feature encoding Dimensionality
Visit annotations in context

Tags

Dimensionality

Feature encoding

Annotators

intelligence.refinery

URL

willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/
machinelearningmastery.com machinelearningmastery.com

Ensemble Machine Learning Algorithms in Python with scikit-learn

1
1. intelligence.refinery 04 Jul 2019
  
  in Public
  
  Ensemble Machine Learning Algorithms in Python with scikit-learn
  
  Read on July 4, 2019
  
  Machine learning Ensemble methods Machine Learning Mastery
Visit annotations in context

Tags

Machine Learning Mastery

Ensemble methods

Machine learning

Annotators

intelligence.refinery

URL

machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/
www.oreilly.com www.oreilly.com

Evaluating Machine Learning Models

1
1. intelligence.refinery 02 Jul 2019
  
  in Public
  
  Machine learning models are basically mathematical functions that represent the relationship between different aspects of data.
  
  Machine learning Algorithms
Visit annotations in context

Tags

Algorithms

Machine learning

Annotators

intelligence.refinery

URL

oreilly.com/ideas/evaluating-machine-learning-models/page/5/hyperparameter-tuning
jmlr.csail.mit.edu jmlr.csail.mit.edu

bergstra12a.dvi

1
1. intelligence.refinery 02 Jul 2019
  
  in Public
  
  Compared with neural networks configured by a pure grid search,we find that random search over the same domain is able to find models that are as good or betterwithin a small fraction of the computation time.
  
  Random search Grid search Machine learning Neural networks
Visit annotations in context

Tags

Neural networks

Grid search

Random search

Machine learning

Annotators

intelligence.refinery

URL

jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
Jun 2019
towardsdatascience.com towardsdatascience.com

Interpretable Machine Learning – Towards Data Science

1
1. intelligence.refinery 26 Jun 2019
  
  in Public
  
  To interpret a model, we require the following insights :Features in the model which are most important.For any single prediction from a model, the effect of each feature in the data on that particular prediction.Effect of each feature over a large number of possible predictions
  
  Machine learning interpretability
  
  Machine learning Interpretability
Visit annotations in context

Tags

Interpretability

Machine learning

Annotators

intelligence.refinery

URL

towardsdatascience.com/interpretable-machine-learning-1dec0f2f3e6b
christophm.github.io christophm.github.io

5.7 Local Surrogate (LIME) | Interpretable Machine Learning

1
1. intelligence.refinery 26 Jun 2019
  
  in Public
  
  Instability means that it is difficult to trust the explanations, and you should be very critical.
  
  LIME
Visit annotations in context

Tags

LIME

Annotators

intelligence.refinery

URL

christophm.github.io/interpretable-ml-book/lime.html
imbalanced-learn.readthedocs.io imbalanced-learn.readthedocs.io

Comparison of the different over-sampling algorithms — imbalanced-learn 0.4.3 documentation

1
1. intelligence.refinery 25 Jun 2019
  
  in Public
  
  When dealing with a mixed of continuous and categorical features, SMOTE-NC is the only method which can handle this case.
  
  SMOTE-NC
Visit annotations in context

Annotators

intelligence.refinery

URL

imbalanced-learn.readthedocs.io/en/stable/auto_examples/over-sampling/plot_comparison_over_sampling.html
imbalanced-learn.readthedocs.io imbalanced-learn.readthedocs.io

2. Over-sampling — imbalanced-learn 0.4.3 documentation

2
1. intelligence.refinery 25 Jun 2019
  
  in Public
  
  In addition, RandomOverSampler allows to sample heterogeneous data (e.g. containing some strings):
  
  RandomOverSampler
2. intelligence.refinery 25 Jun 2019
  
  in Public
  
  The most naive strategy is to generate new samples by randomly sampling with replacement the current available samples.
  
  Naive random over-sampling
Visit annotations in context

Annotators

intelligence.refinery

URL

imbalanced-learn.readthedocs.io/en/stable/over_sampling.html
varsellcm.r-forge.r-project.org varsellcm.r-forge.r-project.org

VarSelLCM

1
1. intelligence.refinery 25 Jun 2019
  
  in Public
  
  missing values are managed, without any pre-processing, by the model used to cluster with the assumption that values are missing completely at random.
  
  VarSelLCM package
  
  R Mixed-type data clustering
Visit annotations in context

Tags

Mixed-type data clustering

R

Annotators

intelligence.refinery

URL

varsellcm.r-forge.r-project.org/
Local file Local file

Practical Data Science with R, Second Edition MEAP V05

1
1. intelligence.refinery 23 Jun 2019
  
  in Public
  
  Success ina data science project comes not from access to any one exotic tool, but from having quantifiablegoals, good methodology, crossdiscipline interactions, and a repeatable workflow.
  
  Data science
Tags

Data science

Annotators

intelligence.refinery
www.sthda.com www.sthda.com

PCA - Principal Component Analysis Essentials - Articles - STHDA

6
1. intelligence.refinery 12 Jun 2019
  
  in Public
  
  Variables that are correlated with PC1 (i.e., Dim.1) and PC2 (i.e., Dim.2) are the most important in explaining the variability in the data set.
  
  PCA
2. intelligence.refinery 12 Jun 2019
  
  in Public
  
  The cos2 values are used to estimate the quality of the representation The closer a variable is to the circle of correlations, the better its representation on the factor map (and the more important it is to interpret these components) Variables that are closed to the center of the plot are less important for the first components.
  
  PCA
3. intelligence.refinery 12 Jun 2019
  
  in Public
  
  Taken together, the main purpose of principal component analysis is to: identify hidden pattern in a data set, reduce the dimensionnality of the data by removing the noise and redundancy in the data, identify correlated variables
  
  PCA Definitions
4. intelligence.refinery 12 Jun 2019
  
  in Public
  
  the amount of variance retained by each principal component is measured by the so-called eigenvalue.
  
  PCA Eigenvalue
5. intelligence.refinery 12 Jun 2019
  
  in Public
  
  These new variables correspond to a linear combination of the originals. The number of principal components is less than or equal to the number of original variables.
  
  PCA
6. intelligence.refinery 12 Jun 2019
  
  in Public
  
  Principal component analysis is used to extract the important information from a multivariate data table and to express this information as a set of few new variables called principal components.
  
  PCA Definitions
Visit annotations in context

Tags

Eigenvalue

PCA

Definitions

Annotators

intelligence.refinery

URL

sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/
www.octoberraindrops.com www.octoberraindrops.com

Phan_2016_IntroductionToPCA.pdf

1
1. intelligence.refinery 12 Jun 2019
  
  in Public
  
  Thus, when we say that PCA can reduce dimen-sionality, we mean that PCA can compute princi-pal components and the user can choose the smallestnumberKof them that explain 0.95 of the variance.A subjectively satisfactory result would be whenKis small relative to the original number of featuresD.
  
  PCA
Visit annotations in context

Tags

PCA

Annotators

intelligence.refinery

URL

octoberraindrops.com/publications/Phan_2016_IntroductionToPCA.pdf
sebastianraschka.com sebastianraschka.com

About Feature Scaling and Normalization

2
1. intelligence.refinery 11 Jun 2019
  
  in Public
  
  However, this doesn’t mean that Min-Max scaling is not useful at all! A popular application is image processing, where pixel intensities have to be normalized to fit within a certain range (i.e., 0 to 255 for the RGB color range). Also, typical neural network algorithm require data that on a 0-1 scale.
  
  Use min-max scaling for image processing & neural networks.
  
  Min-max scaling Neural networks Data normalization
2. intelligence.refinery 11 Jun 2019
  
  in Public
  
  The result of standardization (or Z-score normalization) is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with μ=0μ=0\mu = 0 and σ=1σ=1\sigma = 1 where μμ\mu is the mean (average) and σσ\sigma is the standard deviation from the mean
  
  Data normalization Definitions
Visit annotations in context

Tags

Neural networks

Data normalization

Min-max scaling

Definitions

Annotators

intelligence.refinery

URL

sebastianraschka.com/Articles/2014_about_feature_scaling.html
link.springer.com link.springer.com

A semiparametric method for clustering mixed data

1
1. intelligence.refinery 11 Jun 2019
  
  in Public
  
  Threshold values of 0.8-0.9 are recommended for well separated clusters; to allow for overlapping clusters, we chose a threshold of 0.6.
  
  Mixed data clustering clustMixType
Visit annotations in context

Tags

clustMixType

Mixed data clustering

Annotators

intelligence.refinery

URL

link.springer.com/article/10.1007/s10994-016-5575-7
Jan 2019
stackoverflow.com stackoverflow.com

Changing chunk background color in RMarkdown

1
1. intelligence.refinery 15 Jan 2019
  
  in Public
  
  Changing chunk background color in RMarkdown
  
  Change the background colour of code chunks in Rmarkdown using CSS.
  
  Stack Overflow CSS RMarkdown
Visit annotations in context

Tags

CSS

RMarkdown

Stack Overflow

Annotators

intelligence.refinery

URL

stackoverflow.com/questions/41030477/changing-chunk-background-color-in-rmarkdown/41031276

intelligence.refinery

Annotations: 66

Joined: January 15, 2019

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL