Capable configurations lose 30 points on average in assertion pass rates from baseline to fully specified tasks, while some weaker configurations approach zero.
大多数人可能认为即使在严格约束下,能力较强的LLM配置仍能保持相对较好的表现,但研究表明即使是最佳配置也会平均下降30个百分点,这挑战了我们对LLM适应能力的认知。
Capable configurations lose 30 points on average in assertion pass rates from baseline to fully specified tasks, while some weaker configurations approach zero.
大多数人可能认为即使在严格约束下,能力较强的LLM配置仍能保持相对较好的表现,但研究表明即使是最佳配置也会平均下降30个百分点,这挑战了我们对LLM适应能力的认知。
Gutman, R. (2021, October 28). Is Moderna Really Better Than Pfizer—Or Is It Just a Higher Dose? The Atlantic. https://www.theatlantic.com/health/archive/2021/10/pfizer-moderna-dose-which-vaccine-best/620501/