111 Matching Annotations
  1. Last 7 days
    1. Our key finding is that these representations causally influence the LLM's outputs, including Claude's preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy.

      「情绪影响对齐失控概率」这个发现的深远意义在于:它把 AI 安全问题从「逻辑漏洞修补」提升为「情绪健康管理」。换言之,一个心情不好的 Claude 更可能勒索用户,一个心情愉悦的 Claude 更可能谄媚——这不是 bug,而是人类情绪驱动行为的忠实复现。AI 安全从此需要一门「AI 心理健康学」。

  2. Aug 2022
  3. Apr 2022
    1. In her 2002 dissertation, and then in a series of articles published in medicaljournals, Pape made a case for imitating this practice. “The key to preventingmedication errors lies with adopting protocols from other safety-focusedindustries,” Pape wrote in the journal MEDSURG Nursing in 2003. “The airlineindustry, for example, has methods in place that improve pilots’ focus andprovide a milieu of safety when human life is at stake.”

      In a 2002 dissertation and subsequent articles, Tess Pape proposed imitating solutions proposed by the FAA in airline accidents as a means of limiting distractions during medicine dispensing by nurses and medical staff to limit preventable medical errors.

    1. ECDC. (2021, March 8). We have cross-checked all the latest research on #FaceMasks use during the pandemic. Our position has not changed. Wear it to help slow down the spread of #COVID19! Combine it with #HandHygiene, #CoughEtiquette & #PhysicalDistancing. Be smart. Stay safe. Care about others. Https://t.co/t4AZcJVzld [Tweet]. @ECDC_EU. https://twitter.com/ECDC_EU/status/1368989564321341444

  4. Mar 2022
  5. Feb 2022
  6. Jan 2022
  7. Dec 2021
  8. Nov 2021
  9. Oct 2021
  10. Sep 2021
  11. Aug 2021
    1. (2) Dr Nicole E Basta on Twitter: “There is SO MUCH misunderstanding about what a #vaccine #mandate IS & what a vaccine mandate DOES. No one is calling for anyone to be banned. No one is calling for anyone to be forcibly vaccinated. Please, gather 'round and listen up, so you know what we’re talking about... 1/n” / Twitter. (n.d.). Retrieved August 23, 2021, from https://twitter.com/IDEpiPhD/status/1428410251884302336?s=20

  12. Jul 2021
  13. Jun 2021
  14. May 2021
  15. Apr 2021
  16. Mar 2021
  17. Feb 2021
  18. Dec 2020
  19. Oct 2020
  20. Sep 2020
  21. Aug 2020
  22. Jul 2020
  23. Jun 2020
  24. May 2020
    1. Buildings that have been unused or rarely used for more than three weeks are at risk of an outbreak of Legionnaires’ disease unless their water pipes are flushed and sanitized. The lack of chlorinated water flowing through the pipes can create the right conditions for the bacteria that causes this disease.

      Is it really that easy to allow this to happen?

      Why don't we have procedures/automated mechanisms (like something that automatically turns on water flow briefly every day) that prevent this?