Hypothesis

3 Matching Annotations

May 2026
arxiv.org arxiv.org

https://arxiv.org/abs/2605.06445

2
1. fxp007 24 May 2026
  
  in Public
  
  Capable configurations lose 30 points on average in assertion pass rates from baseline to fully specified tasks, while some weaker configurations approach zero.
  
  大多数人可能认为即使在严格约束下，能力较强的LLM配置仍能保持相对较好的表现，但研究表明即使是最佳配置也会平均下降30个百分点，这挑战了我们对LLM适应能力的认知。
  
  non-consensus performance-decline llm-robustness
2. fxp007 24 May 2026
  
  in Public
  
  Our findings reveal a phenomenon of constraint decay: as structural requirements accumulate, agent performance exhibits a substantial decline.
  
  大多数人认为随着更多约束的添加，LLM的表现会保持稳定或缓慢下降，但作者发现了一个'约束衰减'现象，即随着结构要求累积，代理性能会出现显著下降，这是一个反直觉的发现。
  
  counterintuitive constraint-decay llm-performance
Visit annotations in context

Tags

llm-performance

constraint-decay

counterintuitive

non-consensus

llm-robustness

performance-decline

Annotators

fxp007

URL

arxiv.org/abs/2605.06445
Apr 2026
huggingface.co huggingface.co

https://huggingface.co/papers/2604.04514

1
1. fxp007 24 Apr 2026
  
  in Public
  
  V3.3 achieves 70.4% in Mode A (zero-LLM), with +23.8pp on multi-hop and +12.7pp on adversarial. V3.2 achieved 74.8% Mode A and 87.7% Mode C; the 4.4pp gap reflects a deliberate architectural trade-off.
  
  在零LLM模式下仅比有LLM支持的模式低17.3%，这一结果令人震惊。这表明生物启发的记忆架构可能比我们想象的更强大，能够在没有大型语言模型支持的情况下保持大部分性能，挑战了'强大AI必须依赖大型模型'的主流观点。
  
  zero-llm performance trade-off
Visit annotations in context

Tags

performance

trade-off

zero-llm

Annotators

fxp007

URL

huggingface.co/papers/2604.04514

Tags

Annotators

URL

Tags

Annotators

URL