Hypothesis

40 Matching Annotations

Oct 2019
www.cs.cornell.edu www.cs.cornell.edu

Lecture 9: SVM

5
1. Shritama 01 Oct 2019
  
  in Public
  
  H={x|wTx+b=0}
  
  what does this mean ?
  
  H = {all points x such that wT.x + b = 0} that is the equation of the hyperplace.
2. Shritama 01 Oct 2019
  
  in Public
  
  SVM
  
  An amazing and simple video on SVM: https://www.youtube.com/watch?v=1NxnPkZM9bc
3. Shritama 01 Oct 2019
  
  in Public
  
  The only difference is that we have the hinge-loss instead of the logistic loss.
  
  What is hinge loss and logistic loss ??
4. Shritama 01 Oct 2019
  
  in Public
  
  If the data is low dimensional it is often the case that there is no separating hyperplane between the two classes.
  
  Why ??
5. Shritama 01 Oct 2019
  
  in Public
  
  The slack variable ξiξi\xi_i allows the input xixi\mathbf{x}_i to be closer to the hyperplane
  
  How ?
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote09.html
www.cs.cornell.edu www.cs.cornell.edu

Lecture 3: The Perceptron

1
1. Shritama 01 Oct 2019
  
  in Public
  
  Mγ≤w⃗ ⋅w⃗ ∗
  
  w.w new = w.w after M updates w.w old = w.w before M updates w.w new = w.w old + M.Gamma M.Gamma < = w.w* new
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote03.html
Sep 2019
www.cs.cornell.edu www.cs.cornell.edu

Lecture 3: The Perceptron

4
1. Shritama 30 Sep 2019
  
  in Public
  
  y2=1
  
  as y E {-1, +1}
2. Shritama 30 Sep 2019
  
  in Public
  
  w⃗ ∗w→∗\vec{w}^* lies on the unit sphere
  
  What does this mean ??
3. Shritama 30 Sep 2019
  
  in Public
  
  w⃗ ⋅xi→
  
  If one were to take the dot product of a unit vector A and a second vector B of any non-zero length, the result is the length of vector B projected in the direction of vector A
4. Shritama 30 Sep 2019
  
  in Public
  
  Quiz#1: Can you draw a visualization of a Perceptron update? Quiz#2: How often can a Perceptron misclassify a point x⃗ x→\vec{x} repeatedly?
  
  doubts 1) http://www.nbertagnolli.com/jekyll/update/2015/08/27/Perceptron_Vis.html
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote03.html
www.cs.cornell.edu www.cs.cornell.edu

Lecture 4: Estimating Probabilities from data

10
1. Shritama 30 Sep 2019
  
  in Public
  
  DDD (sequence of heads and tails)
  
  D is the sequence i.e. y Theta is P(H)
2. Shritama 29 Sep 2019
  
  in Public
  
  E
  
  https://en.wikipedia.org/wiki/Expected_value
3. Shritama 29 Sep 2019
  
  in Public
  
  θθ\theta as a random variable
  
  Let P(H) be variable. P(D) is a constant as it has already occured.
4. Shritama 29 Sep 2019
  
  in Public
  
  derivative and equating it to zero
  
  At maxima and minima, the derivatives are always zero
5. Shritama 28 Sep 2019
  
  in Public
  
  We can now solve for θθ\theta by taking the derivative and equating it to zero.
  
  why ?
6. Shritama 28 Sep 2019
  
  in Public
  
  Posterior Predictive Distribution
  
  Doubtful about this. Refer video: https://www.youtube.com/watch?v=R9NQY2Hyl14
7. Shritama 28 Sep 2019
  
  in Public
  
  Now, we can use the Beta distribution to model P(θ)P(θ)P(\theta): P(θ)=θα−1(1−θ)β−1B(α,β)
  
  Important shit! https://www.youtube.com/watch?v=v1uUgTcInQk
8. Shritama 28 Sep 2019
  
  in Public
  
  HH\mathcal{H}
  
  H is the hypothetical class (i.e., the set of all possible classifiers h(⋅))
9. Shritama 28 Sep 2019
  
  in Public
  
  MLE Principle:
  
  This is very important
10. Shritama 28 Sep 2019
  
  in Public
  
  X
  
  What does P(X,Y) mean ?
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote04.html
www.cs.cornell.edu www.cs.cornell.edu

Lecture 8: Linear Regression

6
1. Shritama 29 Sep 2019
  
  in Public
  
  =argminw1n∑i=1n(x⊤iw−yi)2+λ||w||22λ=σ2nτ2
  
  This means we minimize the loss and the magnitude or w. so some of the weights for the noisy (high variance) features in x become zero. https://towardsdatascience.com/regularization-in-machine-learning-76441ddcf99a
2. Shritama 29 Sep 2019
  
  in Public
  
  P(w)
  
  P(w) - w is considered to be a random variable varying over Gaussian distribution.
3. Shritama 29 Sep 2019
  
  in Public
  
  12πσ2
  
  This is the formula for gaussian distribution
4. Shritama 29 Sep 2019
  
  in Public
  
  ⊤
  
  w = matrix of weights [w1, w2, w3, w4, w5] w^t = transpose of w transpose of w * x should be a scalar . cross-product
5. Shritama 28 Sep 2019
  
  in Public
  
  argminw1n
  
  Where did the n come from in denominator
6. Shritama 28 Sep 2019
  
  in Public
  
  Linear Regression
  
  Need to revise this again. A lot of doubts.
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote08.html
www.cs.cornell.edu www.cs.cornell.edu

13: Bias/Variance and Model Selection

1
1. Shritama 29 Sep 2019
  
  in Public
  
  This gives you a good estimate of the validation error (even with standard deviation)
  
  why ??
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote12.html
www.cs.cornell.edu www.cs.cornell.edu

17: Decision Trees

6
1. Shritama 28 Sep 2019
  
  in Public
  
  Regression Trees
  
  I don't get these.
2. Shritama 28 Sep 2019
  
  in Public
  
  O(nlogn)
  
  How?
3. Shritama 28 Sep 2019
  
  in Public
  
  Decision trees are myopic
  
  Doubtful.
4. Shritama 28 Sep 2019
  
  in Public
  
  Quiz: Why don't we stop if no split can improve impurity? Example: XOR
  
  I don't get this :(
5. Shritama 28 Sep 2019
  
  in Public
  
  −∑kpklog(pk)
  
  This is the value of Entropy
6. Shritama 28 Sep 2019
  
  in Public
  
  KLKLKL-Divergence
  
  What is KL-Divergence?
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote17.html
www.cs.cornell.edu www.cs.cornell.edu

Lecture 2: k-nearest neighbors

4
1. Shritama 28 Sep 2019
  
  in Public
  
  Rescue to the curse:
  
  Dimensionality reduction may have better data
2. Shritama 28 Sep 2019
  
  in Public
  
  ϵNN
  
  Doubt ful about this. Shouldn't it be P(y|xt)P(y|xnn) + P(y|xt)P(y|xnn) ??
3. Shritama 28 Sep 2019
  
  in Public
  
  How does kkk affect the classifier? What happens if k=nk=nk=n? What if k=1k=1k =1?
  
  As per my project,the accuracy changes with K. As k -> n, the accuracy drops down. (Refer project 1 report)
4. Shritama 28 Sep 2019
  
  in Public
  
  −1
  
  This means when Y is not 1 or 0. This seems like a typing mistake here.
Visit annotations in context

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote02_kNN.html
www.cs.cornell.edu www.cs.cornell.edu

Lecture 1: Supervised Learning

3
1. Shritama 28 Sep 2019
  
  in Public
  
  Generalization: ϵ=E(x,y)∼P[ℓ(x,y|h∗(⋅))],
  
  What is this ? Doubt. What is E here ?
2. Shritama 28 Sep 2019
  
  in Public
  
  i.i.d.i.i.d.i.i.d..
  
  independent and identically distributed data points
3. Shritama 28 Sep 2019
  
  in Public
  
  C=RC=R\mathcal{C}=\mathbb{R}.
  
  What is R here ? Is the data set and label set have same space?
  
  Doubt
Visit annotations in context

Tags

Doubt

Annotators

Shritama

URL

cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote01_MLsetup.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL