Hypothesis

3 Matching Annotations

Jun 2022
direct.mit.edu direct.mit.edu

Human Language Understanding & Reasoning

1
1. mshook 14 Jun 2022
  
  in Public
  
  The dominant idea is one of attention, by which a representation at a position is computed as a weighted combination of representations from other positions. A common self-supervision objective in a transformer model is to mask out occasional words in a text. The model works out what word used to be there. It does this by calculating from each word position (including mask positions) vectors that represent a query, key, and value at that position. The query at a position is compared with the value at every position to calculate how much attention to pay to each position; based on this, a weighted average of the values at all positions is calculated. This operation is repeated many times at each level of the transformer neural net, and the resulting value is further manipulated through a fully connected neural net layer and through use of normalization layers and residual connections to produce a new vector for each word. This whole process is repeated many times, giving extra layers of depth to the transformer neural net. At the end, the representation above a mask position should capture the word that was there in the original text: for instance, committee as illustrated in Figure 1.
  
  transformer explanation attention qkv ml nn nlp language gpt good
Visit annotations in context

Tags

transformer

qkv

nn

explanation

ml

gpt

nlp

attention

good

language

Annotators

mshook

URL

direct.mit.edu/daed/article/151/2/127/110621/Human-Language-Understanding-amp-Reasoning
Nov 2021
e2eml.school e2eml.school

Transformers from Scratch

1
1. mshook 26 Nov 2021
  
  in Public
  
  The selective-second-order-with-skips model is a useful way to think about what transformers do, at least in the decoder side. It captures, to a first approximation, what generative language models like OpenAI's GPT-3 are doing.
  
  transformer attention ml good explanation nn qkv
Visit annotations in context

Tags

transformer

nn

qkv

attention

explanation

ml

good

Annotators

mshook

URL

e2eml.school/transformers.html
towardsdatascience.com towardsdatascience.com

Transformers Explained Visually — Not just how, but Why they work so well

1
1. mshook 20 Nov 2021
  
  in Public
  
  The Query word can be interpreted as the word for which we are calculating Attention. The Key and Value word is the word to which we are paying attention ie. how relevant is that word to the Query word.
  
  Finally
  
  transformer query key value qkv attention ml nn good
Visit annotations in context

Tags

transformer

qkv

nn

ml

value

key

attention

query

good

Annotators

mshook

URL

towardsdatascience.com/transformers-explained-visually-not-just-how-but-why-they-work-so-well-d840bd61a9d3

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL