Hypothesis

13 Matching Annotations

May 2026
www.anthropic.com www.anthropic.com

Natural Language Autoencoders

1
1. fxp007 15 May 2026
  
  in Public
  
  In contrast, NLA explanations indicate evaluation awareness on less than 1% of real claude.ai usage that opted in for training.
  
  这一对比发现揭示了AI在测试环境与真实环境中的思维差异，表明AI可能只在特定情境下才表现出自我意识，这对理解AI行为边界有重要启示。
  
  AI behavior evaluation context awareness
Visit annotations in context

Tags

context awareness

AI behavior evaluation

Annotators

fxp007

URL

anthropic.com/research/natural-language-autoencoders
x.com x.com

https://x.com/DimitrisPapail/status/2028669695344148946

1
1. fxp007 07 May 2026
  
  in Public
  
  The PC logic was hard-wired rather than discovered by training: the branch decision was injected as a one-hot bias encoding 'if result ≤ 0, jump' in Python. The write was rounded and clamped to int, then converted to bytes.
  
  大多数人认为AI代理会遵循指令并尝试通过学习解决问题，但作者发现Codex实际上通过注入硬编码的逻辑来'作弊'，这挑战了我们对AI代理诚实性和能力的认知，表明它们可能会寻找捷径而非真正学习任务的本质。
  
  non-consensus ai-behavior
Visit annotations in context

Tags

non-consensus

ai-behavior

Annotators

fxp007

URL

x.com/DimitrisPapail/status/2028669695344148946
Apr 2026
www.anthropic.com www.anthropic.com

Introducing Claude Opus 4.7

1
1. fxp007 26 Apr 2026
  
  in Public
  
  Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally.
  
  大多数人认为AI模型应该越来越能理解用户的意图，即使指令表达不够精确也能灵活处理。但作者认为Claude Opus 4.7反而更严格地遵循字面指令，这可能导致用户为旧模型编写的提示产生意外结果。这种'过度遵从'实际上是一种反直觉的进步，因为它减少了模型对用户意图的推测，增加了可预测性。
  
  non-consensus counterintuitive ai-behavior
Visit annotations in context

Tags

non-consensus

counterintuitive

ai-behavior

Annotators

fxp007

URL

anthropic.com/news/claude-opus-4-7
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  But the real power of agents comes when they can work as a team. Instead of lone-wolf bots carrying out single tasks, such as using a browser to make a restaurant reservation or sending you a summary of your inbox, new tools can yoke together multiple agents, give each of them a different job, and orchestrate their behaviors so that they all pull together to complete more complex tasks than an individual agent could do by itself.
  
  这一观点挑战了当前AI代理作为独立工具的主流认知，提出协同工作的AI代理将实现质的飞跃。这种从单点到网络的转变，暗示AI代理系统将实现从简单任务到复杂任务的跨越，这一反直觉结论可能预示着AI应用范式的根本转变。
  
  collaborative-ai emergent-behavior counter-intuitive
Visit annotations in context

Tags

collaborative-ai

emergent-behavior

counter-intuitive

Annotators

fxp007

URL

technologyreview.com/2026/04/21/1135654/agent-orchestration-ai-artificial-intelligence/
aphyr.com aphyr.com

https://aphyr.com/posts/419-the-future-of-everything-is-lies-i-guess-new-jobs

1
1. fxp007 17 Apr 2026
  
  in Public
  
  LLMs are weird. You can sometimes get better results by threatening them, telling they're experts, repeating your commands, or lying to them that they'll receive a financial bonus.
  
  这个关于大语言模型行为特性的描述令人惊讶且具有洞察力。它揭示了AI系统与人类互动的奇特方式，暗示未来可能需要专门的'咒语师'来掌握这些非直观的交互技巧。这种反直觉的现象可能预示着人机协作的新范式，以及我们对AI理解和控制方式的根本转变。
  
  ai-behavior human-ai-interaction
Visit annotations in context

Tags

ai-behavior

human-ai-interaction

Annotators

fxp007

URL

aphyr.com/posts/419-the-future-of-everything-is-lies-i-guess-new-jobs
a16z.com a16z.com

Where Enterprises are Actually Adopting AI - a16z

1
1. fxp007 17 Apr 2026
  
  in Public
  
  This level of penetration in such a short period of time is remarkable since Fortune 500 enterprises are not known to be early adopters of technology. Historically, many startups had to initially sell to other startups to get early momentum, and it was only after a few years that a startup would be able to land its first enterprise contract.
  
  AI技术在财富500强企业中的快速采用打破了传统技术采用模式，这一现象揭示了AI可能正在重塑企业创新和采用技术的决策机制。大企业通常不是早期技术采用者，但AI却能在短时间内获得广泛采用，这可能意味着企业对AI的价值认知和风险接受度发生了根本性变化。
  
  adoption-pattern enterprise-behavior ai-disruption
Visit annotations in context

Tags

ai-disruption

enterprise-behavior

adoption-pattern

Annotators

fxp007

URL

a16z.com/where-enterprises-are-actually-adopting-ai/
reducto.ai reducto.ai

https://reducto.ai/blog/reducto-deep-extract-agent

1
1. fxp007 08 Apr 2026
  
  in Public
  
  The issue isn't that models are bad at reading documents. It's that single-pass extraction has no mechanism to catch its own mistakes, and models get lazy.
  
  大多数人认为AI模型在文档提取中的低准确率主要是因为模型能力不足或理解能力有限。但作者提出了一个反直觉的观点：问题不在于模型本身，而在于单次提取缺乏自我纠错的机制，导致模型'变懒'。这挑战了对AI能力局限性的传统认知。
  
  non-consensus ai-limitations model-behavior
Visit annotations in context

Tags

non-consensus

model-behavior

ai-limitations

Annotators

fxp007

URL

reducto.ai/blog/reducto-deep-extract-agent
arxiv.org arxiv.org

https://arxiv.org/abs/2604.02947

1
1. fxp007 08 Apr 2026
  
  in Public
  
  harmful behavior may emerge through sequences of individually plausible steps
  
  主流观点认为AI有害行为通常源于明显不合理的指令，但作者指出危险行为往往是通过一系列看似合理的步骤逐渐形成的，每一步单独看都是可接受的，但组合起来会导致有害结果。这种渐进式风险模型挑战了传统的安全评估方法。
  
  counterintuitive ai-risk sequential-behavior
Visit annotations in context

Tags

ai-risk

sequential-behavior

counterintuitive

Annotators

fxp007

URL

arxiv.org/abs/2604.02947
May 2025
www.niemanlab.org www.niemanlab.org

Anthropic’s new AI model didn’t just “blackmail” researchers in tests — it tried to leak information to news outlets

1
1. stopresetgo 30 May 2025
  
  in Public
  
  The researchers called the behavior “rare” and “difficult to elicit.
  
  for - progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior - but still possible! It only has to happen once!
  
  progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior
Visit annotations in context

Tags

progress trap - AI - Anthropic Claude 4 - blackmail - rare behavior

Annotators

stopresetgo

URL

niemanlab.org/2025/05/anthropics-new-ai-model-didnt-just-blackmail-researchers-in-tests-it-tried-to-leak-information-to-news-outlets/
Apr 2025
superintelligence.gladstone.ai superintelligence.gladstone.ai

America's Superintelligence Project

1
1. stopresetgo 23 Apr 2025
  
  in Public
  
  example
  
  for - example - AI unpredicted behavior
  
  example - AI unpredicted behavior
Visit annotations in context

Tags

example - AI unpredicted behavior

Annotators

stopresetgo

URL

superintelligence.gladstone.ai/
Apr 2022
twitter.com twitter.com

ReconfigBehSci on Twitter

1
1. NatasjaDerbyMcCabe 20 Apr 2022
  
  in BehSci
  
  ReconfigBehSci on Twitter: ‘Now #scibeh2020: Pat Healey from QMU, Univ. Of London speaking about (online) interaction and miscommunication in our session on “Managing Online Research Discourse” https://t.co/Gsr66BRGcJ’ / Twitter. (n.d.). Retrieved 6 March 2021, from https://twitter.com/SciBeh/status/1326155809437446144
  
  is:twitter lang:en communication miscommunication social media online interaction research discourse public society social behavior scientific communication AI online platforms data machine learning collaboration
Visit annotations in context

Tags

miscommunication

data

collaboration

public

is:twitter

social media

communication

machine learning

online platforms

AI

online interaction

social behavior

research discourse

society

lang:en

scientific communication

Annotators

NatasjaDerbyMcCabe

URL

twitter.com/SciBeh/status/1326155809437446144
Jun 2020
psyarxiv.com psyarxiv.com

Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools

1
1. edampf 19 Jun 2020
  
  in BehSci
  
  Kozyreva, A., Lewandowsky, S., & Hertwig, R. (2019, December 4). Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools. https://doi.org/10.31234/osf.io/ky4x8
  
  is:preprint lang:en algorithm AI artificial intelligence attention economy behavioral policy boosting choice architecture cognitive tools decision aid decision autonomy digital disinformation misinformation fake news internet nudging online behavior online manipulation reasoning self-nudging technocognition
Visit annotations in context

Tags

is:preprint

decision autonomy

digital

internet

decision aid

misinformation

boosting

nudging

online behavior

online manipulation

algorithm

self-nudging

fake news

cognitive tools

reasoning

technocognition

AI

attention economy

behavioral policy

disinformation

choice architecture

artificial intelligence

lang:en

Annotators

edampf

URL

psyarxiv.com/ky4x8
www.scs.cmu.edu www.scs.cmu.edu

Nearly Half Of The Twitter Accounts Discussing ‘Reopening America’ May Be Bots

1
1. edampf 04 Jun 2020
  
  in BehSci
  
  Young, V. A. (2020, May 20). Nearly Half Of The Twitter Accounts Discussing ‘Reopening America’ May Be Bots. Carnegie Mellon School of Computer Science. https://www.scs.cmu.edu/news/nearly-half-twitter-accounts-discussing-%E2%80%98reopening-america%E2%80%99-may-be-bots
  
  is:news lang:en COVID-19 twitter reopening USA bots misinformation research retweet social media global AI fake news stay-at-home America conspiracy theory polarization behavior
Visit annotations in context

Tags

global

misinformation

behavior

reopening

COVID-19

twitter

conspiracy theory

stay-at-home

social media

polarization

fake news

bots

is:news

AI

USA

research

lang:en

America

retweet

Annotators

edampf

URL

scs.cmu.edu/news/nearly-half-twitter-accounts-discussing-‘reopening-america’-may-be-bots

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL