112 Matching Annotations
  1. Last 7 days
    1. Notice now that our EntityRuler is functioning before the “ner” pipe and is, therefore, prefinding entities and labeling them before the NER gets to them. Because it comes earlier in the pipeline, its metadata holds primacy over the later “ner” pipe.

      The whole point about sequence and precedence is erroneous. The solution the author has in mind (despite the contradictory phrasing and code) seems to be to put the entity_ruler BEFORE ner. Although this works here, it is NOT deterministic and NOT the standard way of solving the problem.

      • If you put the entity_ruler BEFORE ner, you just suggest a label to the NER model. The NER model can potentially override your rule-based matches if it has strong predictions.
      • If you put the entity_ruler AFTER ner, your rules have the final say and override any conflicting NER predictions. Note, however, that for this behaviour to work you have to set overwrite_ents to True in a configuration argument. E.g. ruler = nlp.add_pipe("entity_ruler", config={"overwrite_ents": True})
    1. Below is a complete list of the AttributeRuler pipes available to you from spaCy and the Matchers. 1.3.1.1. Attribute Rulers

      This is confusing: AttributeRuler is a pipe like all the others listed under "Attribute Rulers", and the plural "Attribute Rulers" does not make any sense here. Correct: "Below is a complete list of the standard pipes and matchers from spaCy (a matcher "just" finds patterns and does not tag or manipulate data in the same way as pipes)." 1.3.1.1 Standard Pipes - AttributeRuler - DependencyParser - etc.

  2. Nov 2025
    1. and the Survived Column. Remember, if a person survived, they have a 1; if they did not, they have a 0. We can use the sum to know how many male vs. female survivors there were.

      somewhat confusing given that the Survived column is not even included in the example

    1. the internet links a specific and unique address that can be used as a way to connect to a server without having to type out an IP address

      i.e. URLs are used instead of IP addresses

    1. We can likewise do the same in reverse by grabbing all indices up to the first index. In other words, the item in index 0.

      Unclear: "all indices up to the first index" can only refer to the single index 0. This is nonsensical phrasing. - If it is meant that we can grab all indices from the beginning of a list up to certain index then the example should be e.g. print (first_list[:2]) and the description should be changed accordingly. - If it is meant that we can grab indexes from behind (from right to left) up to the first index (being 0), the example/syntax given is wrong. Then it should be print (first_list[::-1]. This seems more likely given that the description would make sense as it stands and the mistake would just concern two signs in the example. But the concept of slice notation should be better introduced then.

  3. Oct 2025