2 Matching Annotations
  1. Last 7 days
    1. METR conclude that “the length of tasks AI can do is doubling every 7 months”. I’m not convinced that pattern will continue to hold, but it’s an eye-catching way of illustrating current trends in agent capabilities.

      a potential pattern to watch. Even if it doesn't follow a exponential trajectory. If it keeps the pattern in tact, by August we should see days of SE work being done independently by models.

    2. The chart shows tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. As you can see, 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes.

      Interesting metric. Until 2024 models were capable of independently execute software engineering tasks that take a person under 30mins. This chimes with my personal observation that there was no real time saving involved, or regular automation can handle it. In 2025 that jumped to tasks taking a person multiple hours. With Claude Opus 4.5 reaching 4:45 hrs. That is a big jump. How do you leverage that personally?