3 Matching Annotations
  1. Jun 2026
    1. current agent performance is still strongly shaped by harness behavior and workflow choices, not just base-model quality

      大多数人认为AI代理的性能主要由底层模型的质量决定,但作者提出了一个反直觉的观点:代理的实际性能很大程度上受到工具行为和工作流程选择的塑造,而非仅仅是基础模型的质量。这挑战了行业对模型能力的传统关注点。

  2. May 2026
    1. Our framework improves both the decision agent to learn better skill retrieval and action generation, while the skill bank agent continually extracts, refines, and updates skills together with their contracts.

      该框架不仅提高了决策智能体的技能检索和动作生成能力,而且技能库智能体持续提取、精炼和更新技能及其合约,这表明了框架在技能管理和更新方面的效率。

  3. Apr 2026
    1. Contemplating mode provides significant capability improvements in challenging tasks, achieving 58% in Humanity's Last Exam and 38% in FrontierScience Research.

      这些具体数字展示了多智能体并行推理的惊人效果,接近人类水平的能力提升,暗示了AI协作模式可能成为解决复杂问题的关键路径,而非单纯扩大模型规模。