CONCLUSION - GPT's performance (percentile) compared to human test-takers was lower than reported by open ai - scaled score on the essay also deviated from "true" essay score => that could imply that the actual true percentile in UBE was lower than reported - reported score (298) is 28 higher than passing score => the essay scores would have to be extremely inaccurate to undermine the conclusion of Katz et al (that gpt passed the bar exam)
but still