  1. Apr 2017
    1. arg maxvw;vcP(w;c)2Dlog11+evcvw

      maximise the log probability.

    2. p(D= 1jw;c)the probability that(w;c)came from the data, and byp(D= 0jw;c) =1p(D= 1jw;c)the probability that(w;c)didnot.

      probability of word,context present in text or not.

    3. Loosely speaking, we seek parameter values (thatis, vector representations for both words and con-texts) such that the dot productvwvcassociatedwith “good” word-context pairs is maximized.
    4. In the skip-gram model, each wordw2Wisassociated with a vectorvw2Rdand similarlyeach contextc2Cis represented as a vectorvc2Rd, whereWis the words vocabulary,Cis the contexts vocabulary, anddis the embed-ding dimensionality.

      Factors involved in the Skip gram model