- Jul 2021
-
www.baeldung.com www.baeldung.com
-
Vectors with a small Euclidean distance from one another are located in the same region of a vector space. Vectors with a high cosine similarity are located in the same general direction from the origin.
-
-
aylien.com aylien.com
-
Recommendations DON'T use shifted PPMI with SVD. DON'T use SVD "correctly", i.e. without eigenvector weighting (performance drops 15 points compared to with eigenvalue weighting with (p = 0.5)). DO use PPMI and SVD with short contexts (window size of (2)). DO use many negative samples with SGNS. DO always use context distribution smoothing (raise unigram distribution to the power of (lpha = 0.75)) for all methods. DO use SGNS as a baseline (robust, fast and cheap to train). DO try adding context vectors in SGNS and GloVe.
-
- Jun 2020
-
link.aps.org link.aps.org
-
Liu, Andrew, and Mason A. Porter. ‘Spatial Strength Centrality and the Effect of Spatial Embeddings on Network Architecture’. Physical Review E 101, no. 6 (9 June 2020): 062305. https://doi.org/10.1103/PhysRevE.101.062305.
-
- Dec 2019
-
nlpoverview.com nlpoverview.com
-
The quality of word representations is generally gauged by its ability to encode syntactical information and handle polysemic behavior (or word senses). These properties result in improved semantic word representations. Recent approaches in this area encode such information into its embeddings by leveraging the context. These methods provide deeper networks that calculate word representations as a function of its context.
- Syntactical information
- Polysemic behavior (word senses)
- Semantic word representations
Entendo que lidar com word senses significa dizer que a representação das palavras consegue medidas similares para palavras similares.
O que seria informação sintática? E sua relação com representações semânticas da palavra?
-
Traditional word embedding algorithms assign a distinct vector to each word. This makes them unable to account for polysemy. In a recent work, Upadhyay et al. (2017) provided an innovative way to address this deficit. The authors leveraged multilingual parallel data to learn multi-sense word embeddings.
- multilingual parallel data
- multi-sense word embeddings
-
This is very important as training embeddings from scratch requires large amount of time and resource. Mikolov et al. (2013) tried to address this issue by proposing negative sampling which is nothing but frequency-based sampling of negative terms while training the word2vec model.
Amostragem negativa... termos negativos?
-
A general caveat for word embeddings is that they are highly dependent on the applications in which it is used. Labutov and Lipson (2013) proposed task specific embeddings which retrain the word embeddings to align them in the current task space.
Acredito que aplicação aqui se relaciona com contexto, logo word embeddings são dependentes de contexto. Isso é bem óbvio, a princípio. Seria isso o que o autor quis dizer?
Retreinar as incorporações para alinhar à tarefa corrente. Alinhar seria nada mais do que adequar as incorporações prévias no novo contexto, é isso?
-
One solution to this problem, as explored by Mikolov et al. (2013), is to identify such phrases based on word co-occurrence and train embeddings for them separately. More recent methods have explored directly learning n-gram embeddings from unlabeled data (Johnson and Zhang, 2015).
Co-ocorrência de palavras eu consigo entender, mas treinar as embeddings separadamente não. Seria supor a co-ocorrência das palavras como unidade na incorporação, em vez da palavra apenas?
Tags
Annotators
URL
-
-
parerga.hypotheses.org parerga.hypotheses.org
-
The word vector is the arrow from the point where all three axes intersect to the end point defined by the coordinates.
The three axes gives each one a context.
Tags
Annotators
URL
-
- Jun 2017
-
w4nderlu.st w4nderlu.st
- Apr 2017
-
levyomer.files.wordpress.com levyomer.files.wordpress.com
-
arg maxvw;vcP(w;c)2Dlog11+evcvw
maximise the log probability.
-
p(D= 1jw;c)the probability that(w;c)came from the data, and byp(D= 0jw;c) =1p(D= 1jw;c)the probability that(w;c)didnot.
probability of word,context present in text or not.
-
Loosely speaking, we seek parameter values (thatis, vector representations for both words and con-texts) such that the dot productvwvcassociatedwith “good” word-context pairs is maximized.
-
In the skip-gram model, each wordw2Wisassociated with a vectorvw2Rdand similarlyeach contextc2Cis represented as a vectorvc2Rd, whereWis the words vocabulary,Cis the contexts vocabulary, anddis the embed-ding dimensionality.
Factors involved in the Skip gram model
-
- Jun 2016
-
aclweb.org aclweb.org
-
Neural Word Embedding Methods
-
dimension of embedding vectors strongly dependson applications and uses, and is basically determinedbased on the performance and memory space (orcalculation speed) trade-of
-