1 Matching Annotations
  1. Last 7 days
    1. Because a scaling law is only fit on the (relatively small, relatively cheap) models that we can afford to train, and the prediction is _extrapolated_ for a model orders of magnitude larger.

      缩放定律拟合基于小型模型,但预测用于大型模型,这种外推可能导致巨大误差。初学者常低估外推的不确定性,导致资源分配不当。实践时应谨慎使用外推结果,并在可能的情况下进行实际验证。