9 Matching Annotations
  1. Sep 2023
    1. BERT only swaps 10% of the 15% tokens selected for masking (in total 1.5% of all tokens) and leaves 10% of the tokens intact

      noise is manually introduced by swapping words randomly without masking them, this prevents overfitting on MASK token

    2. given a context, a language model predicts the probability of a word occurring in that context

      useful definition of "language model"

    3. Instead of just training a model to map a single vector for each word, these methods train a complex, deep neural network to map a vector to each word based on the entire sentence/surrounding context.

      difference between simple context-less embeddings and representations from BERT etc

  2. Aug 2023
  3. citeseerx.ist.psu.edu citeseerx.ist.psu.edu
    1. Compositionality is defined here as the property whereby “the meaning of anexpression is a monotonic function of the meaning of its parts and the way they are puttogether.” (Cann 1993:4) Recursion is “the phenomenon by which a constituent of a sen-tence dominates another instance of the same syntactic category . . . recursion is the principlereason that the number of sentences in a natural language is normally taken to be infinite”(Trask 1993:229-230)

      Kirby defines two fundamental properties of language. For one, language is compositional. The meaning of each part of a sentence, as well as the order in which these individual pieces appear, defines the overall meaning of that sentence. Language is also recursive (and therefore infinite). Each constituent in a phrase may contain one or more child constituents of the same syntactic category.

  4. Aug 2022
    1. authoring organizations realized the old copies were sticking around but they did not want that history available (presumably to hide the fact they were changing things without notice or public disclosure). So they started forcing the removal of PDF and similar documents off the Archive

      Someone commented that their own list of links in their field gradually got replaced with Internet Archive snapshots until institutions even enforced a removal of those.

  5. Jul 2022
    1. unfiltered

      filtered

    2. image regurgitation

      If clusters of images from the training dataset are too similar, the model ends up creating a mix of those instead of considering other inputs.

    3. various guardrails