23 Matching Annotations
  1. Jan 2022
  2. Dec 2021
    1. What we’re going to suggest is that American intellectuals – we areusing the term ‘American’ as it was used at the time, to refer toindigenous inhabitants of the Western Hemisphere; and ‘intellectual’to refer to anyone in the habit of arguing about abstract ideas –actually played a role in this conceptual revolution.

      I appreciate the way that they're normalizing the idea of "American intellectuals" and what that really means.

  3. Sep 2021
  4. Jul 2021
  5. Jun 2020
  6. Jul 2019
    1. in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance

      Use standardization, not min-max scaling, for clustering and PCA.

    1. many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
  7. Jun 2019
  8. Mar 2019
    1. One of the challenges of deep learning is that the gradients with respect to the weights in one layerare highly dependent on the outputs of the neurons in the previous layer especially if these outputschange in a highly correlated way. Batch normalization [Ioffe and Szegedy, 2015] was proposedto reduce such undesirable “covariate shift”. The method normalizes the summed inputs to eachhidden unit over the training cases. Specifically, for theithsummed input in thelthlayer, the batchnormalization method rescales the summed inputs according to their variances under the distributionof the data

      batch normalization的出现是为了解决神经元的输入和当前计算值交互的高度依赖的问题。因为要计算期望值,所以需要拿到所有样本然后进行计算,显然不太现实。因此将取样范围和训练时的mini-batch保持一致。但是这就把局限转移到mini-batch的大小上了,很难应用到RNN。因此需要LayerNormalization.

  9. Feb 2019
  10. Dec 2018
    1. Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

      核心是这么一句话: Generalized Batch Normalization (GBN) to be identical to conventional BN but with

      1. standard deviation replaced by a more general deviation measure D(x)

      2. and the mean replaced by a corresponding statistic S(x).

  11. Sep 2018
    1. Normalization

      一列数字,如果每个数字都减去平均值,则新形成的数列均值为0.

      一列数字,如果每个数字都除以标准差,则新形成的数列标准差为1.

      如果我要构造一个均值为0,标准差为 0.1 的数列怎么做?

      1. \(x_i \leftarrow x_i - \mu\)

      2. \(x_i \leftarrow x_i / \sigma\)

      3. \(x_i \leftarrow x_i * 0.1\)

      经过这三步归一化的动作,既能保持原来分布的特点,又能做到归一化为均值为0,标准差为 0.1 的分布。