9 Matching Annotations
  1. Dec 2016
    1. generative story

      I've seen this term around, and I'm sure one paper a few weeks ago had it too, but what exactly does it mean "generative story"? Is it a general term for inducing something like a grammar or is it something more exact?

  2. Nov 2016
    1. thereisinournow

      How does it know where a pseudoword appears in a chunk? "is no asbestos" can't have appeared that many times in the training data

    2. The PRLG chunker systematically getsDT JJ NN trigrams as chunks.

      So the adjective is never assigned the "B" tag - it's like saying that adjectives can't be at the end of constituent. And the transition probability from a previous determiner word would make sure this is the case - did I understand that right?

    3. That is, themodels will learn to associate terms liketheanda,which often occur at the beginnings of sentences andrarely at the end, with the tagB, which cannot occurat the end of a sentence. Likewise common nounslikecompanyorasset, which frequently occur at theends of sentences, but rarely at the beginning, willcome to be associated with theItag, which cannotoccur at the beginning

      OK! Does this mean that each word in the sequence will be given one of these four tags? And each constituent will have a B and I, kind of like in the previous paper, where they predicted how likely it would be to be at the beginning/end of a constituent?

      I eat the cake B O B I STOP

      something like that?

    4. However, without making this independence as-sumption, we can model right linear rules directly

      All the emission/transition probabilities for each word x and their hidden state y are independent? I suppose it doesn't matter

    1. Using POS tags and positional preferences

      Is their data tagged? Is it worth automatically tagging, will this reduce accuracy if the tagging isn't perfect? Also just how useful are tags, when nouns adjective verbs can all appear at the beginning/end/middle of constituents?

    2. the new separation value sep(i) is than the minimum of the separation values of all pairs of words where the one word is anything from n0 to ni and the other word from ni+1 to nm

      So am I right in thinking that it looks at each word in the whole sentence and compares it with the words at the boundary you're calculating the separation for? And it does it by picking the minimum? I don't see how this helps rare words be a part of a consituent

    1. In addition to monolingual context features, wealso explore the use of alignment features for thoselanguages where we have parallel corpora

      I don't really understand how features from parallel corpora help decide syntactic structure?

    2. This property is not strictly true of linguisticdata, but is a good approximation: as Lee et al.(2010) note, assigning each word type to its mostfrequent part of speech yields an upper bound ac-curacy of 93% or more for most languages

      But if we assign each word type only one tag, it'll never be perfect! 93% is a lot but the word "run" will never be perfectly represented. I suppose it's the drawback of "unsupervised" methods