The analysis showsthat, although they are superficially similar, NCE is a general parameter estimation technique that is asymp-totically unbiased, while negative sampling is best understood as a family of binary classification modelsthat are useful for learning word representations but not asa general-purpose estimator
I think NCE is slightly different from CE. Unfortunately, Chris sort of ignores Noah's work on CE in this explanation. Although, the connection between NCE and NS is nicely explained.