5 Matching Annotations
  1. Last 7 days
    1. tuning a standalone evaluator to be skeptical turns out to be far more tractable

      深刻揭示了LLM自我评价的局限性:生成器难以对自身工作保持批判性。通过解耦生成与评估,并刻意调优独立评估器的“怀疑态度”,能有效打破AI自嗨的闭环。这种对抗式架构是提升输出质量的强效杠杆。

  2. Oct 2023
    1. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.

      Them's fighten' words!

      I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.

      The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.

  3. Oct 2022
  4. Jun 2020