5 Matching Annotations
  1. Nov 2025
    1. LLM benchmarks are essential for tracking progress and ensuring safety in AI, but most benchmarks don't measure what matters.

      Paper concludes most benchmarks used for LLMs to establish progress are mistargeted / leave out aspects that matter.

  2. Oct 2020
  3. Feb 2020