15 Matching Annotations
  1. Jun 2026
    1. Opus 4.6 turned the vulnerabilities it had found in Mozilla's Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

      从「几百次中成功2次」到「181次成功+29次寄存器控制」——这不是一个量的提升,而是一个本质性的能力跃迁。漏洞利用开发是安全领域公认的最高技术门槛之一,需要对内存布局、编译器行为和CPU微架构有深刻理解。Opus 4.6的近零成功率意味着这个能力几乎不存在;Mythos Preview的181次意味着这个能力已经可靠地进入了可重复执行的范畴。

  2. May 2026
    1. The crux of the vulnerability is that Starlette accepts invalid host header values that cause authenticating apps that use Starlette's request.url object to approve unauthorized access requests.

      大多数人认为复杂的AI系统漏洞需要复杂的攻击手段,但作者认为这个漏洞仅通过修改HTTP主机头就能实现,这挑战了'高级系统需要高级攻击'的直觉认知,展示了简单输入验证错误可能导致灾难性后果的反直觉案例。

    1. The important point is that this is not ordinary file writing. It never calls write() on /usr/bin/su. Instead, it appears to rely on a kernel bug/primitive involving spliced file pages and the crypto API to get controlled bytes placed into the page-cache representation of a privileged executable.

      HTML格式使AI能够更好地解释复杂的技术概念,如内核漏洞利用机制,通过结构化呈现提高理解度。

  3. Jun 2025
  4. May 2025
    1. Anthropic researchers said this was not an isolated incident, and that Claude had a tendency to “bulk-email media and law-enforcement figures to surface evidence of wrongdoing.”

      for - question - progress trap - open source AI models - for blackmail and ransom - Could a bad actor take an open source codebase and twist it to do harm like find out about an rogue AI creator's adversary, enemy or victim and blackmail them? - progress trap - open source AI - criminals - exploit to identify and blackmail victiims

  5. Aug 2024
  6. May 2024
  7. Mar 2024
  8. Nov 2022
  9. Nov 2021
  10. Oct 2020
    1. Most people seem to follow one of two strategies - and these strategies come under the umbrella of tree-traversal algorithms in computer science.

      Deciding whether you want to go deep into one topic, or explore more topics, can be seen as a choice between two types of tree-traversal algorithms: depth-first and breadth-first.

      This also reminds me of the Explore-Exploit problem in machine learning, which I believe is related to the Multi-Armed Bandit Problem.

  11. May 2020
    1. WhyGeneral infrastructure simply takes time to build. You have to carefully design interfaces, write documentation and tests, and make sure that your systems will handle load. All of that is rival with experimentation, and not just because it takes time to build: it also makes the system much more rigid.Once you have lots of users with lots of use cases, it’s more difficult to change anything or to pursue radical experiments. You’ve got to make sure you don’t break things for people or else carefully communicate and manage change.Those same varied users simply consume a great deal of time day-to-day: a fault which occurs for 1% of people will present no real problem in a small prototype, but it’ll be high-priority when you have 100k users.Once this playbook becomes the primary goal, your incentives change: your goal will naturally become making the graphs go up, rather than answering fundamental questions about your system.

      The reason the conceptual architecture tends to freeze is because there is a tradeoff between a large user base and the ability to run radical experiments. If you've got a lot of users, there will always be a critical mass of complaints when the experiment blows up.

      Secondly, it takes a lot of time to scale up. This is time that you cannot spend experimenting.

      Andy here is basically advocating remaining in Explore mode a little bit longer than is usually recommended. Doing so will increase your chances of climbing the highest peak during the Exploit mode.

  12. Apr 2019
    1. Oops, I think that one might even be exploitable… I think I’m going to stop here. This needs a structured effort, not spending ten minutes every now and then. As I said, the codebase isn’t bad. But there are obvious issues that shouldn’t have been there. As always, spotting the issues is the easy part – proving that they are exploitable is far harder. I’m not going to spend time on that right now, so let’s just file these under “minor quality issues” rather than “security problems.”