Hypothesis

15 Matching Annotations

May 2026
openai.com openai.com

https://openai.com/index/model-disproves-discrete-geometry-conjecture/

1
1. fxp007 21 May 2026
  
  in Public
  
  The result is also notable for how it was found. The proof came from a new general-purpose reasoning model... In this case, it produced a proof resolving the open problem.
  
  大多数人认为解决数学难题需要人类数学家的直觉、创造力和深度思考。但作者认为一个没有专门针对数学训练的通用AI模型能够独立解决长期存在的开放问题，这挑战了人类创造力在数学研究中的核心地位，暗示AI可能拥有类似人类的原创思维能力。
  
  counterintuitive ai-reasoning creativity
Visit annotations in context

Tags

counterintuitive

ai-reasoning

creativity

Annotators

fxp007

URL

openai.com/index/model-disproves-discrete-geometry-conjecture/
epoch.ai epoch.ai

RIP Classic Reasoning Benchmarks. What's Next? - Epoch AI

1
1. fxp007 07 May 2026
  
  in Public
  
  GPT-5.5 Pro still regularly gets my favorite GSM8K question wrong.
  
  这一表述暗示即使是先进的AI系统在基本数学问题上仍有错误，表明AI在看似简单任务上的脆弱性。虽然没有具体错误率数据，但这一观察强调了基础推理能力评估的重要性。
  
  data-point basic-reasoning ai-limitations
Visit annotations in context

Tags

basic-reasoning

data-point

ai-limitations

Annotators

fxp007

URL

epoch.ai/gradient-updates/rip-classic-benchmarks
Apr 2026
epoch.ai epoch.ai

https://epoch.ai/blog/have-ai-capabilities-accelerated

3
1. fxp007 30 Apr 2026
  
  in Public
  
  Three of the four metrics (ECI, log METR 50% time horizon, and a math-focused index we constructed from several math benchmarks) show strong evidence that progress has sped up relative to a global linear trend fit to data from 2023 onward.
  
  大多数人认为AI能力提升是渐进式的线性发展，但作者通过数据分析发现，在三个关键指标上，AI能力实际上已经加速，这挑战了人们对AI发展速度的普遍认知。这种加速现象发生在2023年之后，与推理模型的发布时间点吻合。
  
  non-consensus ai-acceleration reasoning-models
2. fxp007 26 Apr 2026
  
  in Public
  
  Three of four metrics show strong evidence of acceleration, seemingly driven by reasoning models.
  
  大多数人认为AI能力提升是渐进式的线性增长，但作者通过数据分析发现，在四个关键能力指标中有三个出现了明显加速，且这种加速似乎与推理模型的出现直接相关。这挑战了人们对AI进步速度的普遍认知。
  
  non-consensus ai-progress reasoning-models
3. fxp007 26 Apr 2026
  
  in Public
  
  Three of four metrics show strong evidence of acceleration, seemingly driven by reasoning models.
  
  大多数人认为AI能力的发展是持续稳定的线性增长，但作者通过数据分析发现，在四个关键指标中有三个显示出明显的加速趋势，这种加速是由推理模型驱动的。这一结论挑战了人们对AI进步速度的常规认知，表明2024年推理模型的引入可能标志着AI能力发展模式的转变。
  
  non-consensus ai-progress reasoning-models
Visit annotations in context

Tags

non-consensus

reasoning-models

ai-acceleration

ai-progress

Annotators

fxp007

URL

epoch.ai/blog/have-ai-capabilities-accelerated
openai.com openai.com

https://openai.com/index/introducing-gpt-5-5/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  GPT‑5.5 found a proof of a longstanding asymptotic fact about off-diagonal Ramsey numbers, later verified in Lean. The result is a concrete example of GPT‑5.5 contributing not just code or explanation, but a surprising and useful mathematical argument in a core research area.
  
  大多数人认为AI在数学研究中的作用主要是辅助计算和验证，但作者认为GPT-5.5能够独立发现数学证明，这在数学研究领域是革命性的。这一观点挑战了人们对AI在创造性思维和抽象推理领域能力的传统认知，暗示AI可能正在从工具转变为研究伙伴。
  
  non-consensus mathematical-reasoning ai-research
Visit annotations in context

Tags

mathematical-reasoning

non-consensus

ai-research

Annotators

fxp007

URL

openai.com/index/introducing-gpt-5-5/
antirez.com antirez.com

https://antirez.com/news/163

1
1. fxp007 24 Apr 2026
  
  in Public
  
  What happens is that weak models hallucinate (sometimes causally hitting a real problem) that there is a lack of validation of the start of the window... without understanding why they, if put together, create an issue.
  
  这一发现揭示了AI漏洞检测的严重局限性：弱模型只能通过模式匹配'发现'表面相似的问题，却无法理解问题之间的因果关系。这表明当前AI在网络安全中的应用可能存在系统性盲点，值得深入研究。
  
  ai-limitations causal-reasoning
Visit annotations in context

Tags

ai-limitations

causal-reasoning

Annotators

fxp007

URL

antirez.com/news/163
a16z.com a16z.com

https://a16z.com/podcast/whats-missing-between-llms-and-agi-vishal-misra-martin-casado/

1
1. fxp007 24 Apr 2026
  
  in Public
  
  the move from pattern matching to understanding cause and effect
  
  作者指出从模式匹配到理解因果关系的转变是AGI的关键，这一观点挑战了当前AI领域过度关注表面模式识别的趋势。它暗示真正的智能需要超越数据关联，达到对世界运作原理的深层理解。
  
  causal-reasoning ai-paradigm
Visit annotations in context

Tags

causal-reasoning

ai-paradigm

Annotators

fxp007

URL

a16z.com/podcast/whats-missing-between-llms-and-agi-vishal-misra-martin-casado/
deepmind.google deepmind.google

https://deepmind.google/blog/gemini-robotics-er-1-6/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Gemini Robotics-ER 1.6 achieves its highly accurate instrument readings by using agentic vision, which combines visual reasoning with code execution. The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals and get an accurate reading.
  
  这一描述揭示了AI如何通过多步骤推理解决复杂问题，展示了模型在处理精细视觉任务时的创新方法。将视觉推理与代码执行相结合的能力代表了AI系统向更接近人类认知方式的方向发展，这种混合方法可能成为未来AI解决复杂物理任务的标准范式。
  
  multi-step-reasoning visual-ai
Visit annotations in context

Tags

visual-ai

multi-step-reasoning

Annotators

fxp007

URL

deepmind.google/blog/gemini-robotics-er-1-6/
www.microsoft.com www.microsoft.com

https://www.microsoft.com/en-us/research/blog/adele-predicting-and-explaining-ai-performance-across-tasks/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Reasoning-oriented models like OpenAI's o1 and GPT-5 show measurable gains over standard models—not only in logic and mathematics but also with interpreting user intent.
  
  令人惊讶的是：专注于推理的模型如OpenAI的o1和GPT-5不仅在逻辑和数学方面表现出明显优势，在理解用户意图方面也有显著提升。这表明AI推理能力的进步正在从纯逻辑领域扩展到更复杂的社交认知领域，为AI与人类交互提供了新的可能性。
  
  surprising reasoning-ai user-intent
Visit annotations in context

Tags

user-intent

reasoning-ai

surprising

Annotators

fxp007

URL

microsoft.com/en-us/research/blog/adele-predicting-and-explaining-ai-performance-across-tasks/
ai.meta.com ai.meta.com

https://ai.meta.com/blog/introducing-muse-spark-msl/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  After compressing, the model again extends its solutions to achieve stronger performance.
  
  令人惊讶的是：Muse Spark在测试时展现出一种独特的'思想压缩'能力，模型在最初通过延长思考时间提高性能后，会在时间惩罚机制下自发压缩推理过程，然后再扩展解决方案以获得更强的性能。这种动态的自我优化机制在AI模型中前所未见。
  
  surprising ai-reasoning model-optimization
Visit annotations in context

Tags

ai-reasoning

model-optimization

surprising

Annotators

fxp007

URL

ai.meta.com/blog/introducing-muse-spark-msl/
lumalabs.ai lumalabs.ai

https://lumalabs.ai/uni-1/tech-specs

1
1. fxp007 09 Apr 2026
  
  in Public
  
  Uni-1 can perform structured internal reasoning before and during image synthesis. It decomposes instructions, resolves constraints, and plans composition, then renders accordingly.
  
  令人惊讶的是：UNI-1能够在图像合成前后进行结构化内部推理，分解指令、解决约束并规划构图，这打破了传统AI系统只能被动执行指令的局限，展现了一种接近人类思维过程的AI能力。
  
  surprising ai-reasoning
Visit annotations in context

Tags

ai-reasoning

surprising

Annotators

fxp007

URL

lumalabs.ai/uni-1/tech-specs
lumalabs.ai lumalabs.ai

UNI-1 | Less Artificial. More Intelligent. | Luma

2
1. fxp007 09 Apr 2026
  
  in Public
  
  Uni-1 is a multimodal reasoning model that can generate pixels.
  
  令人惊讶的是：UNI-1被描述为'能够生成像素的多模态推理模型'，这种表述暗示它不仅仅是图像生成器，而是真正理解并推理多模态信息的系统，能够将抽象概念转化为具体的视觉表现，代表了AI从简单模式匹配向真正理解概念的重大飞跃。
  
  surprising multimodal ai-reasoning
2. fxp007 09 Apr 2026
  
  in Public
  
  Common-sense scene completion, spatial reasoning, and plausibility-driven transformation.
  
  令人惊讶的是：UNI-1具备常识场景补全、空间推理和基于可能性的转换能力，这意味着它不仅仅是机械地生成图像，而是能够理解物理世界的基本规律，这种能力使生成的图像更加真实可信，代表了AI理解现实世界的重要进步。
  
  surprising ai-reasoning spatial-intelligence
Visit annotations in context

Tags

spatial-intelligence

ai-reasoning

multimodal

surprising

Annotators

fxp007

URL

lumalabs.ai/uni-1
Jun 2020
psyarxiv.com psyarxiv.com

Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools

1
1. edampf 19 Jun 2020
  
  in BehSci
  
  Kozyreva, A., Lewandowsky, S., & Hertwig, R. (2019, December 4). Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools. https://doi.org/10.31234/osf.io/ky4x8
  
  is:preprint lang:en algorithm AI artificial intelligence attention economy behavioral policy boosting choice architecture cognitive tools decision aid decision autonomy digital disinformation misinformation fake news internet nudging online behavior online manipulation reasoning self-nudging technocognition
Visit annotations in context

Tags

choice architecture

online behavior

digital

fake news

artificial intelligence

disinformation

decision autonomy

decision aid

self-nudging

attention economy

technocognition

online manipulation

boosting

lang:en

behavioral policy

reasoning

AI

nudging

misinformation

is:preprint

algorithm

internet

cognitive tools

Annotators

edampf

URL

psyarxiv.com/ky4x8

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL