57 Matching Annotations
  1. Aug 2016
  2. Jul 2016
  3. web.stanford.edu web.stanford.edu
    1. relational meanings

      "to capture linguistic regularities as relations between vectors", IMHO

    2. meanings

      add full stop

    3. different

      typo - difference

  4. Jun 2016
    1. Neural Word Embedding Methods
    2. Thus, we basically need to re-train

      ... in order to achieve what? Statement doesn't seem complete.

      Perhaps "when we need lower dimensional embeddings with d = D', we can't obtain them from higher dimensional embeddings with d = D."?

      However, it is possible, to a certain extent, to obtain lower dimensional embeddings from higher dimensional ones - e.g. via PCA.

    3. dimension of embedding vectors strongly dependson applications and uses, and is basically determinedbased on the performance and memory space (orcalculation speed) trade-of
    1. remove second-order depen-dencies

      What is it meant by this?

      Related question on stats

    2. it reveals simple underlying structures in com-plex data sets using analytical solutions from linear algebra
    3. the third definition

      The definitions are not numbered. It would be nice to have them numbered.

    4. uiˆuj

      These u vectors are orthogonal.

    5. (XTX)ˆvi=liˆvi

      Which means that transforming vector v with that matrix gives us a vector with the same direction. Direction does not change after transformation. This is eigenvector.

    6. PCA and in the process, find that PCA is closely related to
    7. subtract-ing off the mean

      Actually, this is a requirement for computing the covariance matrix Cx.

      Estimation of covariance matrices

    8. entails

      "entails" is an unfortunate choice of words. "implies" / "includes" / "requires" perhaps?

    9. It is evident that the choice ofPdiagonalizesCY

      That is, we have found that, by selecting P = E (the set of eigenvectors of Cx), we get what we wanted: the matrix Cy to be a diagonal matrix.

    10. CY

      This we want to be a diagonal matrix, which would mean that the matrix Y is decorrelated.

    11. orthonormal matrix
    12. the number ofmeasurement types

      That is, the number of features.

    13. ju-dicious

      prudent, sensible.

    14. bely

      What does "bely" mean?

    15. by a simple algorithm
    16. normalized direction

      A vector (direction vector) with norm = 1.

    17. Yisdecorrelated

      The features of the output matrix Y are not correlated. Building a covariance matrix for it would yield a diagonal matrix.

    18. variance

      Elements on the diagonal of the matrix.

    19. covariance

      The off-diagonal elements of the matrix.

    20. large values cor-respond to interesting structure

      Features with high variance. Directions of major spread.

    21. arises from estimation theory
    22. measurement types

      aka features

    23. The covariance measures the degree of the linear relationshipbetween two variables
    24. Because one can calculater1fromr2

      Because there is a simple (almost linear in our case) relationship between the two variables.

    25. is in meters and ̃xAis ininches.

      Again, might be so - but quite ambiguous statement. Since we see a decreasing function on the plot.

    26. nearby

      "nearby" would make sense if the right-most plot of Fig. 3 shows the first diagonal, which it doesn't.

      Or perhaps "nearby", but one of the cameras is upside down.

      All in all, quite ambiguous statement.

    27. correlated

      Pretty image on Wikipedia, it this article about correlation.

    28. Figure 3

      The example for redundancy is not (or at least it doesn't seem to be) in the context of the example with the spring and the ball. Since there is no clear separation between the examples, this might be confusing to readers.

    29. multiple sensors record the samedynamic information

      More features refer to the same (or almost the same) thing.

    30. best-fit line

      But not as in linear regression / ordinary least squares.

      Nice animation on stats.

    31. Maximizing the variance (and by assumption the SNR)corresponds to finding the appropriate rotation of the naivebasis

      PCA relates to rotation.

    32. he dynamics of interest existalong directions with largest variance and presumably high-est SNR
    33. directions with largest variances in ourmeasurement space contain the dynamics of interes

      We seek new features (new directions) which best contain the information (variance) of interest.

      Amount of variance -> amount of information.

    34. rotation and a stretch
    35. how do we get from this data

      How to reduce the 6D data set to a 1D data set? How to discover the regularities in the data set and achieve dimensionality reduction?

    36. our measurements might not even be 90o

      The features are not orthogonal. Information brought by distinct measurements is overlapping.

    37. non-parametric method

      It does not make any assumptions about the distribution of the data.

      r-tutor, Non-parametric methods

      PSU, Non-parametric methods

    38. ball’s position in a three-dimensional space

      ball's position = a data sample

      three-dimensional space = the feature space, with 3 x 2 features (because each camera records in 2D). Time dimension not recorded since it is, actually, the index of a data sample.

      Some of these features (dimensions) are not necessary (they are redundant).

    39. does not lie along the basis of the recording(xA;yA)butrather along the best-fit line

      Ambiguous statement. A "direction" cannot lie along a "basis". Perhaps "basis vectors"?

      Also, if "best-fit line" usually refers to a line found via least-squares regression, which is not the case here (PCA versus linear regression).

    40. largest directionof variance

      "direction of largest variance" perhaps?

    41. are a set of new basis vec-tors

      This means that P is an orthogonal matrix.

    42. newrepresentation of that data set

      Original data, with a different base.

    43. basis

      New basis, right?

    44. Thus our original basis reflects the methodwe measured our data
    45. some orthonormal basis

      PCA will uncover a smaller, better, orthonormal basis.

    46. the number of measurement types

      That is, the number of features.

    47. 72000 of these vectors

      The data matrix. We apply PCA on this.

    48. structure

      And, hopefully, the structure can be expressed in a lower-dimensional space (1D in our case).

    49. noise

      AFAIK PCA works good when noise is Gaussian.

    50. variablex

      Unfortunate labelling of variable. x would be time, actually.

      To do: don't name the variable, it's not necessary.