As of March 2026, AI systems are able to post-train models to get about half as much of the uplift as ones trained by humans. The specific eval scores are derived by a 'weighted average is taken across all post-trained LLMs... The top-scoring systems as of April get 25%-28% (Opus 4.6, and GPT 5.4), compared to a human score of 51%.'
在模型微调任务上,AI系统已能达到人类研究员51%性能的一半,显示出AI在科研任务上的显著进步。
