Hypothesis

4 Matching Annotations

May 2026
jack-clark.net jack-clark.net

Import AI 455: Automating AI Research

1
1. fxp007 15 May 2026
  
  in Public
  
  In 2022, GPT 3.5 could do tasks that might take a person about ~30 seconds. In 2023, this rose to 4 minutes with GPT-4. In 2024, this rose to 40 minutes (o1). In 2025, it reached ~6 hours (GPT 5.2 (High)). In 2026, it has already risen to ~12 hours (Opus 4.6).
  
  AI系统能独立完成任务的时间从2022年的30秒大幅增加到2026年的12小时，展示了AI自主工作能力的指数级增长。
  
  capability-scaling time-horizon
Visit annotations in context

Tags

time-horizon

capability-scaling

Annotators

fxp007

URL

jack-clark.net/2026/05/04/import-ai-455-automating-ai-research/
Apr 2026
research.google research.google

https://research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/

1
1. fxp007 25 Apr 2026
  
  in Public
  
  existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome
  
  在测试时扩展(Test-time scaling)领域，主流观点认为只有最终结果才是有价值的，探索过程只是达到结果的手段。但作者认为被忽视的探索轨迹实际上是一个丰富的数据源，可以加速智能体从经验中学习的能力。这一观点挑战了传统TTS方法的价值评估标准。
  
  non-consensus test-time-scaling exploration-value
Visit annotations in context

Tags

exploration-value

non-consensus

test-time-scaling

Annotators

fxp007

URL

research.google/blog/reasoningbank-enabling-agents-to-learn-from-experience/
epoch.ai epoch.ai

https://epoch.ai/blog/have-ai-capabilities-accelerated

1
1. fxp007 25 Apr 2026
  
  in Public
  
  Parameters are estimated by unweighted least squares. Time t is measured in years since the first observation in each dataset.
  
  研究使用最小二乘法进行参数估计，时间以年为单位从每个数据集的第一个观测点开始计算。这种方法选择是统计标准做法，但未加权处理可能低估了近期数据点的重要性，因为近期数据点通常代表更先进的模型能力。时间单位的选择也影响了增长率解释的直观性。
  
  data-point statistical-method time-scaling
Visit annotations in context

Tags

time-scaling

statistical-method

data-point

Annotators

fxp007

URL

epoch.ai/blog/have-ai-capabilities-accelerated
sakana.ai sakana.ai

https://sakana.ai/fugu-beta/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  The depth of recursion becomes a tunable compute axis at inference time, requiring no retraining. A small model, by reading itself, can iterate toward answers that neither it nor any of its workers could reach in a single pass.
  
  大多数人认为模型性能提升需要更大的参数规模或重新训练，但作者提出了一种反直觉的方法：通过递归调用自身，小模型可以在推理时自我迭代，达到单次推理无法达到的答案质量。这挑战了我们对模型规模与能力关系的传统认知。
  
  counterintuitive model-scaling inference-time
Visit annotations in context

Tags

model-scaling

inference-time

counterintuitive

Annotators

fxp007

URL

sakana.ai/fugu-beta/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL