Hypothesis

The benchmark tasks were meticulously constructed to be realistic, involving the hard work of hundreds of experts and likely millions of dollars — placing it among the most expensive economics papers of all time.

作者提到GDPval基准测试可能花费了数百万美元，由数百名专家参与构建。这一数据点显示了AI基准测试的高昂成本，但也暗示了这类测试可能存在资源分配不均的问题。考虑到其成本与实际经济影响之间的差距，这种高投入低产出的现象值得反思。

data-point benchmark-cost ai-economics

Tags

Annotators

URL