1 Matching Annotations
  1. Last 7 days
    1. The benchmark tasks were meticulously constructed to be realistic, involving the hard work of hundreds of experts and likely millions of dollars — placing it among the most expensive economics papers of all time.

      作者提到GDPval基准测试可能花费了数百万美元,由数百名专家参与构建。这一数据点显示了AI基准测试的高昂成本,但也暗示了这类测试可能存在资源分配不均的问题。考虑到其成本与实际经济影响之间的差距,这种高投入低产出的现象值得反思。