Hypothesis

4 Matching Annotations

Last 7 days
techcrunch.com techcrunch.com

https://techcrunch.com/2026/06/13/kpmg-pulls-report-on-ai-usage-due-to-apparent-hallucinations/

1
1. fxp007 13 Jun 2026
  
  in Public
  
  KPMG pulls report on AI usage due to apparent hallucinations
  
  主流观点认为大型专业咨询公司如KPMG应该有严格的事实核查流程，能够确保发布报告的准确性。然而，这个标题暗示即使是顶级专业机构也可能被AI的'幻觉'误导，这挑战了人们对专业机构质量控制能力的信任，表明AI错误可能比我们想象的更普遍且更具欺骗性。
  
  non-consensus professional-standards ai-misinformation
Visit annotations in context

Tags

non-consensus

ai-misinformation

professional-standards

Annotators

fxp007

URL

techcrunch.com/2026/06/13/kpmg-pulls-report-on-ai-usage-due-to-apparent-hallucinations/
Jun 2026
techcrunch.com techcrunch.com

https://techcrunch.com/2026/06/09/can-tech-companies-learn-to-love-cheaper-models/

1
1. fxp007 09 Jun 2026
  
  in Public
  
  Quality comes first, and in legal it always will... However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.
  
  大多数人认为在专业领域如法律，必须使用最强大、最先进的AI模型才能保证质量，但作者引用Harvey公司创始人的观点认为，质量的定义正在转变——从使用最强大的模型转向使用能以最高效率获得正确答案的模型，这一观点挑战了行业对'质量即规模'的传统认知。
  
  non-consensus ai-quality professional-services
Visit annotations in context

Tags

non-consensus

ai-quality

professional-services

Annotators

fxp007

URL

techcrunch.com/2026/06/09/can-tech-companies-learn-to-love-cheaper-models/
Apr 2026
artificialanalysis.ai artificialanalysis.ai

APEX-Agents-AA Benchmark Leaderboard | Artificial Analysis

1
1. fxp007 10 Apr 2026
  
  in Public
  
  GPT-5.4 (xhigh) scores the highest on APEX-Agents-AA Pass@1 with a score of 33.3%, followed by Claude Opus 4.6 (Adaptive Reasoning, Max Effort) with a score of 33.0%, and Gemini 3.1 Pro Preview with a score of 32.0%
  
  令人震惊的数字：即便是全球最强的 AI Agent，在投行/咨询/律所的专业任务上也只有三分之一的成功率。更惊讶的是前三名几乎并列——GPT-5.4 的 33.3%、Claude Opus 4.6 的 33.0%、Gemini 3.1 Pro 的 32.0%——三家顶级实验室在专业服务 Agent 评测上的差距已缩小到统计噪声级别。「谁的 AI 更强」的问题，在这个维度上已经没有明确答案。
  
  33-percent benchmark three-way-tie professional-AI surprising
Visit annotations in context

Tags

professional-AI

33-percent

three-way-tie

surprising

benchmark

Annotators

fxp007

URL

artificialanalysis.ai/evaluations/apex-agents-aa
Sep 2025
rutgers.instructure.com rutgers.instructure.com

Negotiating Identity in the Age of ChatGPT.pdf: 2025FA - ENGLISH COMPOSITION 21:355:101:27

1
1. Sheila_2020 12 Sep 2025
  
  in Public
  
  In this paragraph, instead of looking at plagiarism or anything related to that, the study is relating to people and how ai influences people to think about themselves as real researchers.
  
  What is largely missing is a sustained inquiry into how generative AI tools reshape researchers’ self-perception, particularly with regard to their professional identity and epistemic legitimacy.
Visit annotations in context

Tags

What is largely missing is a sustained inquiry into how generative AI tools reshape researchers’ self-perception, particularly with regard to their professional identity and epistemic legitimacy.

Annotators

Sheila_2020

URL

rutgers.instructure.com/courses/366098/files/51532838

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL