Hypothesis

3 Matching Annotations

Mar 2023
aisnakeoil.substack.com aisnakeoil.substack.com

GPT-4 and professional benchmarks: the wrong answer to the wrong question

3
1. ravenscroftj 21 Mar 2023
  
  in Public
  
  Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.
  
  OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer
  
  openai gpt ModelEvaluation
2. ravenscroftj 21 Mar 2023
  
  in Public
  
  In fact, we can definitively show that it has memorized problems in its training set: when prompted with the title of a Codeforces problem, GPT-4 includes a link to the exact contest where the problem appears (and the round number is almost correct: it is off by one). Note that GPT-4 cannot access the Internet, so memorization is the only explanation.
  
  GPT4 knows the link to the coding exams that it was evaluated against but doesn't have "internet access" so it appears to have memorised this as well
  
  openai gpt ModelEvaluation
3. ravenscroftj 21 Mar 2023
  
  in Public
  
  To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.
  
  OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training
  
  llm openai gpt ModelEvaluation
Visit annotations in context

Tags

gpt

llm

openai

ModelEvaluation

Annotators

ravenscroftj

URL

aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

Tags

Annotators

URL