Hypothesis

2 Matching Annotations

Jan 2026
simonwillison.net simonwillison.net

2025: The year in LLMs

2
1. tonz 02 Jan 2026
  
  in Public
  
  METR conclude that “the length of tasks AI can do is doubling every 7 months”. I’m not convinced that pattern will continue to hold, but it’s an eye-catching way of illustrating current trends in agent capabilities.
  
  a potential pattern to watch. Even if it doesn't follow a exponential trajectory. If it keeps the pattern in tact, by August we should see days of SE work being done independently by models.
  
  ai automation humantaskreplacement
2. tonz 02 Jan 2026
  
  in Public
  
  The chart shows tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. As you can see, 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes.
  
  Interesting metric. Until 2024 models were capable of independently execute software engineering tasks that take a person under 30mins. This chimes with my personal observation that there was no real time saving involved, or regular automation can handle it. In 2025 that jumped to tasks taking a person multiple hours. With Claude Opus 4.5 reaching 4:45 hrs. That is a big jump. How do you leverage that personally?
  
  ai llms automation humantaskreplacement
Visit annotations in context

Tags

automation

llms

ai

humantaskreplacement

Annotators

tonz

URL

simonwillison.net/2025/Dec/31/the-year-in-llms/

Tags

Annotators

URL