Hypothesis

7 Matching Annotations

Last 7 days
x.com x.com

X 上的 Viv：“Improving Deep Agents with Harness Engineering” / X

7
1. jjhaoxuan 28 Apr 2026
  
  in Public
  
  Teaching Agents to Write Testable Code
  
  这个正是我们要做的，就是动态注入工具。比如一些金融操作涉及到确定性违背，我们需要动态进行工具计算。返回危险程度
2. jjhaoxuan 28 Apr 2026
  
  in Public
  
  a Ralph Wiggum Loop where a hook forces
  
  这个恰好是我们的设计核心算法通过钩子函数进行拦截，避免agent 直接执行错误操作
3. jjhaoxuan 28 Apr 2026
  
  in Public
  
  Fetch experiment traces from LangSmithSpawn parallel error analysis agents → main agent synthesizes findings + suggestionsAggregate feedback and make targeted changes to the harness.
  
  如果只是单纯的拿到输入和输出，那可以。但是一定不能让agent 拿到测试数据。一旦通过测试数据，构建pattern ，优化迭代就会出问题。
4. jjhaoxuan 28 Apr 2026
  
  in Public
  
  System Prompt, Tools, and Middleware (our term for hooks around model and tool calls).
  
  可以参考 - 聚焦三大核心：系统提示词、工具与中间件（本文特指围绕模型调用和工具调用的钩子机制）。
5. jjhaoxuan 28 Apr 2026
  
  in Public
  
  We use Harbor to orchestrate the runs. It spins up sandboxes (Daytona),
  
  实验通过 Harbor 统筹调度全流程：自动启动 Daytona 沙箱环境、对接智能体运行循环，并完成结果校验与分数评定。这里两个英文值得看看是啥？回头过来看
6. jjhaoxuan 28 Apr 2026
  
  in Public
  
  only tweaked the harness
  
  这里具体怎么微调的呢
7. jjhaoxuan 28 Apr 2026
  
  in Public
  
  Design decisions include the system prompt, tool choice, and execution flow.
  
  系统提示词，工具，整体的 workflow ；这是harness 的工作范畴。给了一个定义
Visit annotations in context

Annotators

jjhaoxuan

URL

x.com/Vtrivedy10/status/2023805578561060992

Annotators

URL