2 Matching Annotations
  1. Jan 2024
    1. Santosh Vempala, a computer science professor at Georgia Tech, has also studied hallucinations. “A language model is just a probabilistic model of the world,” he says, not a truthful mirror of reality. Vempala explains that an LLM’s answer strives for a general calibration with the real world—as represented in its training data—which is “a weak version of accuracy.” His research, published with OpenAI’s Adam Kalai, found that hallucinations are unavoidable for facts that can’t be verified using the information in a model’s training data.

      “A language model is just a probabilistic model of the world”

      Hallucinations are a result of an imperfect model, or attempting answers without the necessary data in the model.

    1. We will call such near-verbatim outputs “plagiaristic outputs,” because if a human created them we would call them prima facie instances of plagiarism.

      Defining “plagiaristic outputs”