Hypothesis

7 Matching Annotations

Apr 2026
epoch.ai epoch.ai

https://epoch.ai/blog/have-ai-capabilities-accelerated

1
1. fxp007 25 Apr 2026
  
  in Public
  
  We use four AI capability metrics: ECI (Epoch Capabilities Index), METR 50% Time Horizon, Combined Math Index, and WeirdML V2 Index.
  
  研究使用了四个不同的AI能力指标，这增加了结果的可靠性。每个指标都从不同维度测量AI能力，包括综合能力(ECI)、时间效率(METR)、数学能力(Combined Math)和特定环境下的性能(WeirdML)。多指标方法减少了单一指标的偏差风险。
  
  data-point metrics evaluation-framework
Visit annotations in context

Tags

metrics

evaluation-framework

data-point

Annotators

fxp007

URL

epoch.ai/blog/have-ai-capabilities-accelerated
www.microsoft.com www.microsoft.com

https://www.microsoft.com/en-us/research/blog/adele-predicting-and-explaining-ai-performance-across-tasks/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  ADeLe scores tasks across 18 core abilities, such as attention, reasoning, domain knowledge, and assigns each task a value from 0 to 5 based on how much it requires each ability.
  
  令人惊讶的是：ADeLe框架使用18种核心能力来评估任务，包括注意力、推理和领域知识等，并为每个任务分配0到5的评分。这种多维度的评估方法揭示了传统AI评估中忽视的细节，使研究者能够更精确地理解任务难度和模型能力之间的复杂关系。
  
  surprising evaluation-framework multi-dimensional
Visit annotations in context

Tags

multi-dimensional

evaluation-framework

surprising

Annotators

fxp007

URL

microsoft.com/en-us/research/blog/adele-predicting-and-explaining-ai-performance-across-tasks/
arxiv.org arxiv.org

https://arxiv.org/abs/2604.03016

1
1. fxp007 08 Apr 2026
  
  in Public
  
  Each task includes a unified evaluation framework supporting sandboxed code and APIs, alongside a human reference trajectory annotated with stepwise checkpoints along dual-axis: S-axis and V-axis.
  
  大多数人认为AI评估可以通过简单的自动化测试完成。但作者提出需要复杂的双轴(S-axis和V-axis)人工参考轨迹和沙箱环境支持，这暗示了评估AI代理能力的极端复杂性远超当前行业的普遍认知。这一观点挑战了AI评估的简化主义倾向，强调了人类参与在评估中的不可替代性。
  
  counterintuitive evaluation-framework human-in-the-loop
Visit annotations in context

Tags

counterintuitive

human-in-the-loop

evaluation-framework

Annotators

fxp007

URL

arxiv.org/abs/2604.03016
Jan 2023
www.igi-global.com www.igi-global.com

Developing and Aligning a Knowledge Management Strategy: Towards a Taxonomy and a Framework

1
1. mlenc 20 Jan 2023
  
  in Public
  
  knowledge translation kt framework evaluation
Visit annotations in context

Tags

framework

kt

knowledge translation

evaluation

Annotators

mlenc

URL

igi-global.com/chapter/developing-aligning-knowledge-management-strategy/54478
Sep 2021
Local file Local file

PII: 0277-5395(83)90035-3

1
1. dsouthgate 12 Sep 2021
  
  in Public
  
  he first criterion of adequacy in this approach is that the active voice of the subject should be heard
  
  is the interpretation adequate? criteria for answering the question of adequacy is outlined. 1) not objectifying 2) theoretical underpinning must allow for interpretation of the social dynamic of observer-subject. 3) The theoretical reworking has to allow for the revelation of underlying social structures.
  
  Evaluation Framework criteria
Tags

Framework

criteria

Evaluation

Annotators

dsouthgate
May 2021
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

The role of supportive supervision on immunization program outcome - a randomized field trial from Georgia

1
1. SIYANYE 20 May 2021
  
  in BehSci
  
  Djibuti, M., Gotsadze, G., Zoidze, A., Mataradze, G., Esmail, L. C., & Kohler, J. C. (2009). The role of supportive supervision on immunization program outcome—A randomized field trial from Georgia. BMC International Health and Human Rights, 9(Suppl 1), S11. https://doi.org/10.1186/1472-698X-9-S1-S11
  
  lang:en is:article supportive supervision immunization barrier intervention evaluation framework organization
Visit annotations in context

Tags

organization

lang:en

intervention

immunization

supportive supervision

evaluation

is:article

framework

barrier

Annotators

SIYANYE

URL

ncbi.nlm.nih.gov/pmc/articles/PMC3226230/
Jul 2020
www.sciencedirect.com www.sciencedirect.com

Argument Quality in Real World Argumentation

1
1. ErikStuchly 07 Jul 2020
  
  in BehSci
  
  Argument Quality in Real World Argumentation. (2020). Trends in Cognitive Sciences, 24(5), 363–374. https://doi.org/10.1016/j.tics.2020.01.004
  
  is:article lang:en argument quality argumentation real world psychology philosophy classical logic Bayesian framework standard evaluation guidance probability
Visit annotations in context

Tags

classical logic

lang:en

guidance

standard

real world

evaluation

probability

is:article

psychology

Bayesian framework

argumentation

argument quality

philosophy

Annotators

ErikStuchly

URL

sciencedirect.com/science/article/pii/S1364661320300206

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL