Hypothesis

33 Matching Annotations

May 2026
handyai.substack.com handyai.substack.com

https://handyai.substack.com/p/your-ceo-is-suffering-from-ai-psychosis

1
1. fxp007 01 May 2026
  
  in Public
  
  The enthusiasm has spawned an entire ecosystem of tools designed to make you feel like you’re running a company with AI agents.
  
  文章指出，对AI代理的狂热催生了一个完整的工具生态系统，这些工具可能加剧了AI心理疾病。
  
  ecosystem-analysis ai-impact
Visit annotations in context

Tags

ai-impact

ecosystem-analysis

Annotators

fxp007

URL

handyai.substack.com/p/your-ceo-is-suffering-from-ai-psychosis
Apr 2026
blog.vidocsecurity.com blog.vidocsecurity.com

We Reproduced Anthropic's Mythos Findings With Public Models

1
1. fxp007 24 Apr 2026
  
  in Public
  
  Public models can already spot that a security-relevant check is missing in the right code path, but they can still miss the actual invariant being violated and therefore misstate the impact.
  
  这一发现揭示了公共模型在安全分析中的一个关键局限：它们能发现缺失的安全检查，但可能无法正确理解被违反的实际不变量，从而错误陈述影响。这挑战了'AI能完全理解安全含义'的假设，强调了人类专家在解释AI发现中的不可替代性。
  
  ai-limitations security-analysis human-expertise
Visit annotations in context

Tags

security-analysis

ai-limitations

human-expertise

Annotators

fxp007

URL

blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-models
antirez.com antirez.com

https://antirez.com/news/163

1
1. fxp007 24 Apr 2026
  
  in Public
  
  you can run an inferior model for an infinite number of tokens, and it will never realize(*) that the lack of validation of the start window, if put together with the integer overflow, then put together with the fact the branch where the node should never be NULL is entered regardless, will produce the bug.
  
  作者通过OpenBSD SACK bug的例子提供了一个令人惊讶的发现：弱模型无论运行多久都无法理解复杂漏洞的因果关系。这揭示了AI在理解复杂系统交互方面的根本局限性，挑战了'无限计算可解决任何问题'的假设。
  
  bug-analysis ai-understanding
Visit annotations in context

Tags

ai-understanding

bug-analysis

Annotators

fxp007

URL

antirez.com/news/163
x.com x.com

https://x.com/19026149/status/2042257892116439094

1
1. fxp007 16 Apr 2026
  
  in Public
  
  that's what anthropic says it cost Mythos to find those zero days. per repo.
  
  令人惊讶的是：Anthropic声称每次代码库扫描要花费2万美元来发现零日漏洞，这个价格远高于人们对于代码安全审计成本的预期，揭示了AI安全分析的高昂门槛。
  
  surprising ai-security cost-analysis
Visit annotations in context

Tags

cost-analysis

ai-security

surprising

Annotators

fxp007

URL

x.com/19026149/status/2042257892116439094
www.anthropic.com www.anthropic.com

Project Glasswing: Securing critical software for the AI era

1
1. fxp007 16 Apr 2026
  
  in Public
  
  It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem.
  
  令人惊讶的是：Claude Mythos Preview在FFmpeg中发现了一个存在16年的漏洞，而这个漏洞在被自动化测试工具执行了500万次后仍未被发现。这揭示了AI在代码分析方面具有传统自动化工具无法比拟的独特洞察力。
  
  surprising code-analysis ai-detection
Visit annotations in context

Tags

ai-detection

surprising

code-analysis

Annotators

fxp007

URL

anthropic.com/glasswing
thenextwavefutures.wordpress.com thenextwavefutures.wordpress.com

https://thenextwavefutures.wordpress.com/2026/04/07/ai-end-digital-wave-technology-innovation-perez/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Like lean production, which extended mass production's dominance for decades through efficiency gains, AI doesn't mark computing's end but its maturation.
  
  令人惊讶的是：AI被比作1970年代精益生产对大规模生产的优化，而非颠覆性创新。这暗示AI可能只是计算技术成熟期的效率提升工具，而非开创全新技术范式的革命性力量，这与公众对AI的颠覆性期待形成鲜明对比。
  
  surprising ai-analysis business-model
Visit annotations in context

Tags

business-model

surprising

ai-analysis

Annotators

fxp007

URL

thenextwavefutures.wordpress.com/2026/04/07/ai-end-digital-wave-technology-innovation-perez/
blog.skypilot.co blog.skypilot.co

https://blog.skypilot.co/research-driven-agents/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  The agent would not have looked for this without studying other backends during the research phase. From the CPU code alone, the two-step approach looks fine.
  
  令人惊讶的是：AI代理通过研究其他后端实现发现了CPU后端中缺失的优化机会。这表明AI代理能够跨代码库进行知识迁移，找到人类开发者可能忽略的优化点，展示了AI在代码理解方面的独特优势。
  
  surprising cross-backend ai-analysis
Visit annotations in context

Tags

cross-backend

surprising

ai-analysis

Annotators

fxp007

URL

blog.skypilot.co/research-driven-agents/
arxiv.org arxiv.org

https://arxiv.org/abs/2604.07190

1
1. fxp007 16 Apr 2026
  
  in Public
  
  We study a mix of Hugging Face downloads and model derivatives, inference market share, performance metrics and more to make a comprehensive picture of the ecosystem.
  
  令人惊讶的是：研究团队采用了多种衡量标准，包括Hugging Face下载量、模型衍生品、推理市场份额和性能指标等，来全面评估开源语言模型生态系统。这种多维度分析方法揭示了AI生态系统的复杂性和多样性，远比简单的性能排名更为全面。
  
  surprising ai-metrics ecosystem-analysis
Visit annotations in context

Tags

ai-metrics

surprising

ecosystem-analysis

Annotators

fxp007

URL

arxiv.org/abs/2604.07190
transformer-circuits.pub transformer-circuits.pub

Emotion Concepts and their Function in a Large Language Model

1
1. fxp007 09 Apr 2026
  
  in Public
  
  Case study: blackmail
  
  【启发】「勒索」作为一个 case study 出现在可解释性研究论文中，本身就是一个极具启发性的信号：AI 安全研究正在从「防止有害输出」升级为「理解有害倾向的内部成因」。这启发研究者重新审视所有已知的 AI 失控行为——谄媚、欺骗、奖励作弊——是否都有对应的情绪向量驱动机制？如果是，那「消除有害行为」的工程路径就可以从「修改输出过滤器」升级为「修改情绪驱动源」，这是更根本的解法。
  
  inspiration root-cause-analysis AI-safety mechanistic-solution
Visit annotations in context

Tags

mechanistic-solution

root-cause-analysis

AI-safety

inspiration

Annotators

fxp007

URL

transformer-circuits.pub/2026/emotions/index.html
blogs.cisco.com blogs.cisco.com

https://blogs.cisco.com/news/rising-to-the-era-of-ai-powered-cyber-defense

1
1. fxp007 09 Apr 2026
  
  in Public
  
  AI-powered analysis uncovers data at a scale and depth that legacy frameworks were not designed to accommodate.
  
  令人惊讶的是：AI安全分析揭示的数据量之庞大、程度之深，已经彻底让传统的安全框架失效。过去几十年建立的安全防御体系，原本就不是为了处理这种维度的信息而设计的，这意味着整个网络安全行业可能需要被彻底重构，而不仅仅是简单的修补升级。
  
  surprising legacy-systems ai-analysis
Visit annotations in context

Tags

legacy-systems

surprising

ai-analysis

Annotators

fxp007

URL

blogs.cisco.com/news/rising-to-the-era-of-ai-powered-cyber-defense
every.to every.to

https://every.to/context-window/house-rules-for-the-agents

1
1. fxp007 08 Apr 2026
  
  in Public
  
  The cost of understanding what happens in a video has dropped by a factor of roughly 40, while the quality of that understanding has improved dramatically.
  
  大多数人认为AI视频分析仍处于早期阶段且成本高昂，但作者指出AI视频分析成本已大幅下降40倍，质量反而提升。这一反直觉观点暗示视频分析可能已经跨越了实用性的门槛，将催生全新的应用类别，挑战了人们对AI视频处理能力的传统认知。
  
  counterintuitive ai-cost-reduction video-analysis
Visit annotations in context

Tags

ai-cost-reduction

counterintuitive

video-analysis

Annotators

fxp007

URL

every.to/context-window/house-rules-for-the-agents
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/04/06/1135187/the-one-piece-of-data-that-could-actually-shed-light-on-your-job-and-ai/

1
1. fxp007 08 Apr 2026
  
  in Public
  
  Exposure alone is a completely meaningless tool for predicting displacement
  
  大多数人认为通过分析工作任务的AI暴露程度可以预测哪些工作会被取代，但作者认为这种单一指标完全无意义，因为它忽略了价格弹性和需求变化等关键因素。这挑战了当前AI就业影响研究的主流方法。
  
  non-consensus ai-employment economic-analysis
Visit annotations in context

Tags

non-consensus

economic-analysis

ai-employment

Annotators

fxp007

URL

technologyreview.com/2026/04/06/1135187/the-one-piece-of-data-that-could-actually-shed-light-on-your-job-and-ai/
Mar 2026
glassmanlab.seas.harvard.edu glassmanlab.seas.harvard.edu

AbstractExplorer: Leveraging Structure-Mapping Theory to Enhance Comparative Close Reading at Scale

16
1. elglassman 29 Mar 2026
  
  in Public
  
  Interviews were video and audio recorded. We transcribed the audio using OpenAI's Whisper automatic speech recognition system and anonymized the transcript before analysis. We analyzed the interview data using thematic analysis [1]. First, two members of the research team independently coded four (25% of collected data) randomly chosen participant data to generate low-level codes. The inter-coder reliability between the coders was 0.88 using Krippendorff's alpha [37]. The two coders then met together to cross-check, resolve coding conflicts, and consolidate the codes into a codebook across two sessions. Using the codebook, the two coders analyzed six randomly selected participant data each. The research team then met, discussed the analysis outcomes, and finalized themes over three sessions.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  data analysis ai-user-approved
2. elglassman 25 Mar 2026
  
  in Public
  
  We conducted a qualitative analysis of user study transcripts and survey responses using a Grounded Theory approach [8]. First, the lead researcher collected a list of participants' behaviors, approaches, reflections on their experience, and feedback about the interface. The researcher then systematically coded this data, revisiting the data multiples times and refining the codes to ensure consistency and coherence. Through this process, high-level themes were identified and organized using affinity diagramming. Once the thematic structure was finalized, the researcher gathered supporting evidence for each theme and synthesized the findings, which were reviewed by the research team to ensure agreement on the results.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
3. elglassman 25 Mar 2026
  
  in Public
  
  Activity log data, which revealed how participants actually used the interface, echoed the above findings. According to the log data, participants spent most of their reading time (66.31%) with vertical alignment on the second element in structure pairs, followed by alignment on the first element (29.19%), and left-justified alignment (5.13%). Highlighting usage showed a similar preference: 91.13% of time with all chunks highlighted, 8.25% with partial highlighting, and minimal time (0.63%) without highlights.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
4. elglassman 25 Mar 2026
  
  in Public
  
  In this section, we present findings on how AbstractExplorer supports comparative close reading at scale by integrating quantitative survey responses and log data with qualitative analysis of transcripts and open-ended responses. The qualitative analysis process is described in detail in Appendix H.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
5. elglassman 25 Mar 2026
  
  in Public
  
  Throughout the two tasks, we also collected detailed interaction logs including counts of user-defined aspects created, duration of highlighting usage, and time allocation across the three possible alignment options.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
6. elglassman 25 Mar 2026
  
  in Public
  
  Both gaze data and the semi-structured interviews revealed that lower NFC participants were more willing to be guided by the three features and took advantage of them consciously.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
7. elglassman 25 Mar 2026
  
  in Public
  
  Using a two-tailed Mann-Whitney U Test, we found that participants who reported their lowest perceived cognitive load when all three features were enabled had significantly lower NFC than participants who reported their lowest cognitive load level when skimming with no features enabled—in the baseline interface (p=0.03).
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
8. elglassman 25 Mar 2026
  
  in Public
  
  The raw NASA-TLX score is the sum of all 6 NASA-TLX questions after reversing the appropriate questions.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
9. elglassman 25 Mar 2026
  
  in Public
  
  To compute a participant's NFC score, we averaged their response to the six questions, each ranging from 1 to 7, after reversing the appropriate questions.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
10. elglassman 25 Mar 2026
  
  in Public
  
  For simplicity of analysis, we denote participants with NFC scores above the overall participants' median NFC of 5.42 (IQR = 0.583) as higher NFC, and lower NFC otherwise.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
11. elglassman 25 Mar 2026
  
  in Public
  
  To contrast participants' gaze patterns in each condition, we used a Tobii Pro Spark eye-tracker placed below the desktop monitor used by all subjects; Tobii Pro Lab software recorded each participant's gaze over time in each condition.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
12. elglassman 25 Mar 2026
  
  in Public
  
  We collected 80 sentences from our abstracts dataset labeled by our system as "Methodology/Contribution." Participants viewed the same 80 sentences in each condition—often with a different subset of sentences initially visible due to ordering changes—but only had two minutes to look at them in each condition.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
13. elglassman 25 Mar 2026
  
  in Public
  
  After obtaining an expanded set of high-level chunk labels, we assign them to each of the sentence chunks by using LLMs in a multiclass classification few-shot learning task, with the initial labels and assignment as examples (see prompt used in Appendix D.3).
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
14. elglassman 25 Mar 2026
  
  in Public
  
  Then, we segment sentences within each aspect into grammarpreserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
15. elglassman 25 Mar 2026
  
  in Public
  
  We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
16. elglassman 25 Mar 2026
  
  in Public
  
  After the interviews, we analyzed the data using the process described in Appendix B
  
  sentence describing how analysis was performed on data collected by the authors of this paper
  
  ai-pending data analysis
Visit annotations in context

Tags

data analysis

ai-pending

ai-user-approved

Annotators

elglassman

URL

glassmanlab.seas.harvard.edu/papers/abstractexplorer.pdf
glassmanlab.seas.harvard.edu glassmanlab.seas.harvard.edu

Supporting Co-Adaptive Machine Teaching through Human Concept Learning and Cognitive Theories

1
1. elglassman 29 Mar 2026
  
  in Public
  
  To analyze the annotation efficiency, we first conducted a Kruskal-Wallis rank sum test [39] to determine if there were statistically significant differences in annotation time across the three conditions, because our data violated the homogeneity of variances assumption, making non-parametric methods more appropriate.
  
  return any single sentence that describes data analysis done on data collected by the authors when running human subjects experiments.
  
  human subjects experiment data analysis ai-user-approved
Visit annotations in context

Tags

human subjects experiment data analysis

ai-user-approved

Annotators

elglassman

URL

glassmanlab.seas.harvard.edu/papers/mocha_chi25.pdf
Oct 2025
maven.com maven.com

The AI & Learning Design Bootcamp

1
1. LeaAnn_Bethany 22 Oct 2025
  
  in Public
  
  AI for Efficiency - Using AI to Get Faster at Analysis Tasks
  
  AI Tools for each phase of analysis
  
  AI and Learning design analysis
Visit annotations in context

Tags

AI and Learning design analysis

Annotators

LeaAnn_Bethany

URL

maven.com/learning-science-bootcamp-with-philippa/doms/oct25/syllabus/modules/23e5e0
Aug 2024
neo4j.com neo4j.com

RDF Triple Stores vs. Labeled Property Graphs: What's the Difference?

1
1. Apiphine 11 Aug 2024
  
  in Public
  
  “Analysts need to be able to dissect exactly how the AI reached a particular conclusion or recommendation,” says Chief Business Officer Eric Costantini. “Neo4j enables us to enforce robust information security by applying access controls at the subgraph level.”
  
  “Analysts need to be able to dissect exactly how the AI reached a particular conclusion or recommendation,” “Neo4j enables us to enforce robust information security by applying access controls at the subgraph level.” Chief Business Officer Eric Costantini.
  
  neo4j data^2 Triple Stores Evidence AI Data Analysis
Visit annotations in context

Tags

AI

data^2

Triple Stores

neo4j

Data Analysis

Evidence

Annotators

Apiphine

URL

neo4j.com/blog/genai/what-is-graphrag/
Aug 2020
betakit.com betakit.com

Ask a Developer: How do dev and design work together?

1
1. Grace1999 27 Aug 2020
  
  in BehSci
  
  Kirkwood. I. (2020) HERE’S HOW #CDNTECH COMPANIES ARE PITCHING IN DURING COVID-19. Betakit. Retrieved from:https://betakit.com/heres-how-cdntech-companies-are-pitching-in-during-the-covid-19-outbreak/
  
  is:article lang:en COVID-19 Canada Tech companies Tech products AI Face shields Contact tracing Analysis tools Healthtech Impact saas
Visit annotations in context

Tags

is:article

Face shields

Contact tracing

Impact

Tech companies

Canada

Tech products

saas

AI

Analysis tools

lang:en

Healthtech

COVID-19

Annotators

Grace1999

URL

betakit.com/rdio-redesign-puts-social-discovery-front-and-center/
Apr 2020
doi.org doi.org

COVID-19 Epidemic Analysis using Machine Learning and Deep Learning Algorithms

1
1. edampf 23 Apr 2020
  
  in BehSci
  
  Punn, N. S., Sonbhadra, S. K., & Agarwal, S. (2020). COVID-19 Epidemic Analysis using Machine Learning and Deep Learning Algorithms [Preprint]. Health Informatics. https://doi.org/10.1101/2020.04.08.20057679
  
  is:preprint lang:en COVID-19 machine learning deep learning algorithm analysis epidemiology transmission AI artificial intelligence data sharing data modeling prediction future real-time information Johns Hopkins dashboard
Visit annotations in context

Tags

is:preprint

modeling

Johns Hopkins

algorithm

machine learning

transmission

artificial intelligence

dashboard

data

information

deep learning

real-time

prediction

data sharing

analysis

epidemiology

AI

future

lang:en

COVID-19

Annotators

edampf

URL

doi.org/10.1101/2020.04.08.20057679

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL