There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.)
这个数据揭示了软件开发需求的爆炸性增长,暗示AI正在加速而非替代软件开发,这是一个反直觉的观点,通常人们认为AI会减少对开发者的需求,但实际上它可能创造了更多的工作量。
There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.)
这个数据揭示了软件开发需求的爆炸性增长,暗示AI正在加速而非替代软件开发,这是一个反直觉的观点,通常人们认为AI会减少对开发者的需求,但实际上它可能创造了更多的工作量。
benchmarks sourced from publicly available material carry contamination risk, where training-data exposure can silently inflate scores.
大多数人认为公开数据集是AI评估的金标准,能够提供客观公正的测试环境。但作者警告,使用公开材料构建的基准测试存在污染风险,训练数据接触会悄无声息地提高分数。这一观点挑战了AI评估领域的传统做法,暗示我们需要更严格的数据隔离措施或转向私有数据集进行评估。
Meta founder and CEO Mark Zuckerberg described superintelligence in a blog post last year
文章提到Meta的AI战略包括开发'超级智能',但未提供具体投资金额、研发时间表或预期成果。缺乏量化依据,无法评估这一战略的规模、时间框架或可能带来的商业价值。这种技术愿景需要更多具体数据来支撑其可行性评估。
Wedbush Securities analyst Dan Ives said in a report on Thursday.
文章提到分析师预测未来可能有更多裁员,但未提供具体数字或预测比例。缺乏量化依据,无法评估分析师预测的可靠性。这类行业分析通常需要更具体的数据支持,如预计裁员数量、时间表或财务影响等。
The layoffs will start on May 20, the company confirmed.
这是一个明确的时间节点,距离文章发布日期(2026年4月23日)约一个月时间。这表明Meta已经完成了决策过程并制定了具体实施计划,反映了公司行动的紧迫性。这种提前通知的时间框架在科技行业裁员中较为常见,给予员工一定的准备时间。
Meta plans to lay off roughly 8,000 employees, or 10% of its workforce
这是一个显著但合理的裁员比例,10%的裁员规模反映了Meta在AI转型中的重大战略调整。相比其他科技公司裁员比例(通常在5-20%之间),这一比例处于中等偏高水平,表明Meta正在积极重组以支持AI投资。此数据点来自公司官方声明,可信度较高。
Drug manufacturers pay pharmacy benefit managers rebates above 50% of list price for formulary access.
制药公司向药品福利管理商支付的回扣超过标价的50%,这一比例远高于OpenAI承诺的17%回报率。这表明在B2B分销渠道中,支付渠道费用是常见做法,但不同行业的支付比例差异很大,制药行业的渠道成本明显高于AI软件行业。
Google Cloud launched a parallel $750m fund to pay McKinsey, Accenture, and Deloitte to train engineers and co-fund client AI projects.
谷歌云的7.5亿美元基金规模约为OpenAI DeployCo(100亿美元)的7.5%,但谷歌云直接向咨询公司支付费用而非承诺回报率。这反映了不同AI厂商采用的不同分销策略,OpenAI通过PE firms获得企业渠道,而谷歌云则通过咨询公司实现市场渗透。
Structure: $500M OpenAI equity plus $4B from TPG, Bain, Advent, Brookfield, and Goanna form a $10B LLC.
DeployCo的结构显示OpenAI出资5亿美元(占总资金的5%),而PE firms出资40亿美元(40%),形成总计100亿美元的LLC。这种资本结构表明OpenAI虽然拥有超级投票权,但在资金贡献上处于次要位置,主要依靠PE firms的渠道网络来推广其产品。
OpenAI pledged $1.5B to a joint venture called DeployCo, guaranteeing private-equity partners a 17% annual return floor over five years.
OpenAI承诺的17%年化回报率显著高于行业平均水平(13-16%),这表明OpenAI愿意支付高额费用以确保其AI软件在企业市场的渗透。这种回报保证相当于为PE partners提供了风险缓冲,反映了OpenAI对市场扩张的强烈意愿,但也意味着OpenAI需要实现更高的业务增长来支撑这一承诺。
Amazon is investing $5 billion in Anthropic today, with up to an additional $20 billion in the future
Amazon对Anthropic的50亿美元投资(当前50亿+未来200亿)显示了云计算巨头对AI领域的战略布局。这一投资规模表明大型科技公司正在通过直接投资AI公司来确保AI基础设施的优先使用权。相比其他AI投资,这是近年来最大的战略投资之一。
run-rate revenue has now surpassed $30 billion, up from approximately $9 billion at the end of 2025
年收入从2025年底的约90亿美元激增至300亿美元,增长率超过230%。这一惊人的收入增长速度反映了AI市场的爆发式增长。然而,考虑到公司规模,这一收入数字需要谨慎看待,可能包含预付款或长期合同收入确认。
committing more than $100 billion over the next ten years to AWS technologies
未来十年向AWS投资超过1000亿美元,这是一个天文数字级的长期承诺。这一投资规模超过了大多数科技公司的市值,表明Anthropic对AI未来的极度看好和长期投入。相比其他云服务合同,这是历史上最大的单一技术投资之一。
over one million Trainium2 chips to train and serve Claude
使用超过100万个Trainium2芯片,这是一个惊人的硬件部署规模。这一数字不仅显示了Anthropic与Amazon的深度合作,也反映了训练和运行大型语言模型所需的庞大计算资源。相比其他AI公司,这种规模的芯片部署表明Anthropic正在全力投入AI基础设施。
over 100,000 customers now run Claude on Amazon Bedrock
10万客户在AWS上运行Claude,这是一个相当大的企业客户基础。这个数字表明Claude在企业市场已经获得了一定的采用率,但与OpenAI的数亿用户相比仍有差距。这一数据点反映了Anthropic在企业市场的定位和进展。
up to 5 gigawatts (GW) of capacity for training and deploying Claude
5GW的算力规模极其庞大,相当于一个小型国家的电力消耗。这一数字表明Anthropic正在为AI模型训练和部署构建前所未有的基础设施,反映了大型语言模型对计算资源的巨大需求。相比其他AI公司的算力规模,这是一个非常激进的扩张计划。
over one million Trainium2 chips to train and serve Claude
100万片Trainium2芯片的使用量展示了AI模型训练的硬件规模。这一数量级表明Anthropic正在进行大规模并行计算,这是训练大型语言模型的基础设施要求。与英伟达GPU的采用相比,Trainium芯片代表了云服务提供商在AI硬件领域的差异化竞争策略。
run-rate revenue has now surpassed $30 billion, up from approximately $9 billion at the end of 2025
年收入从90亿美元跃升至300亿美元,增长率超过233%,这是一个爆炸性的增长速度。这一增长率远超大多数科技公司的历史表现,反映了AI即服务(AIaaS)市场的巨大潜力。然而,如此高的增长率也带来了基础设施扩张的压力,需要与算力投资相匹配。
Amazon is investing $5 billion in Anthropic today, with up to an additional $20 billion in the future
亚马逊对Anthropic的总投资可能达到250亿美元(50亿+200亿),这是AI领域最大规模的投资之一。这一投资规模超过了大多数传统科技巨头对AI初创公司的单笔投资,表明亚马逊对Claude模型的战略重视程度极高,以及AI基础设施市场的巨大潜力。
more than $100 billion over the next ten years to AWS technologies
1000亿美元的十年期投资规模极为庞大,相当于每年约100亿美元。这一投资规模超过了大多数科技公司的年度营收,表明Anthropic对AWS的长期战略承诺。这一数字也反映了AI基础设施建设的资本密集性质,以及云计算提供商在AI生态中的核心地位。
over 100,000 customers now run Claude on Amazon Bedrock
10万客户使用Claude是一个显著的用户基础,表明Anthropic的企业采用率正在快速增长。这个数字与OpenAI的数亿用户相比仍有差距,但对于一个专注于企业级AI模型的初创公司来说,这是一个有意义的里程碑,显示其市场渗透策略正在取得成效。
up to 5 gigawatts (GW) of capacity for training and deploying Claude
5GW的算力规模是惊人的,相当于一个小型国家的电力消耗。这个数字表明Anthropic正在为AI模型训练和部署进行大规模基础设施投资,反映了大型语言模型对计算资源的巨大需求。这一规模与OpenAI等竞争对手的算力投入相当,显示AI算力竞赛正在升级。
For Anthropic, more usage across diverse tasks means more data, which produces a smarter model—just as more queries improved Google search.
大多数人认为AI公司的竞争在于模型架构或算法的优越性,但作者认为数据收集的广度才是关键,这与当前AI行业对模型架构的过度关注形成鲜明对比。
For Anthropic, more usage across diverse tasks means more data, which produces a smarter model—just as more queries improved Google search.
大多数人认为AI公司的竞争在于模型架构或参数规模,但作者认为真正的竞争优势来自用户数据和多样化使用场景,这类似于谷歌的搜索数据飞轮效应。这一观点挑战了AI领域的主流技术决定论,强调了数据网络效应的战略价值。
Parameters are estimated by unweighted least squares. Time t is measured in years since the first observation in each dataset.
研究使用最小二乘法进行参数估计,时间以年为单位从每个数据集的第一个观测点开始计算。这种方法选择是统计标准做法,但未加权处理可能低估了近期数据点的重要性,因为近期数据点通常代表更先进的模型能力。时间单位的选择也影响了增长率解释的直观性。
We pre-selected the 6-month horizon as our primary metric, balancing genuine forecasting distance against the limited date range of our data.
6个月的预测时间窗口是一个关键选择,既考虑了实际预测意义,又受限于数据的时间范围。这个时间跨度相对较短,可能不足以捕捉长期趋势,但适合检测最近的加速变化。选择这一窗口反映了研究者在数据有限情况下的务实权衡。
The minimum training cutoffs are: ECI (June 2024), METR Time Horizon (January 2024), Combined Math (September 2024), and WeirdML V2 (January 2025).
这些时间节点表明研究使用的数据集长度不同,从2024年初到2024年中不等。较短的训练数据集(如WeirdML V2只有约1年的推理模型前数据)可能限制了检测加速的能力,这解释了为什么该指标未能显示加速趋势。时间跨度的差异也反映了不同AI能力指标的发展历史不同。
Our fourth metric, an index constructed from WeirdML V2 results, showed no sign of acceleration. A single global linear trend fit the data best.
25%的指标(WeirdML V2)没有显示加速趋势,这与其它三个指标形成鲜明对比。这个差异可能是因为WeirdML V2设置了资源限制环境(模型只有5次提交代码的机会,无法使用外部工具),这可能反映了现实世界应用中的约束条件,提示AI进步可能并非在所有领域都均匀加速。
We use four AI capability metrics: ECI (Epoch Capabilities Index), METR 50% Time Horizon, Combined Math Index, and WeirdML V2 Index.
研究使用了四个不同的AI能力指标,这增加了结果的可靠性。每个指标都从不同维度测量AI能力,包括综合能力(ECI)、时间效率(METR)、数学能力(Combined Math)和特定环境下的性能(WeirdML)。多指标方法减少了单一指标的偏差风险。
Reasoning models show both a one-off jump in performance and a roughly 2-3x faster trend compared to non-reasoning models.
2-3倍的速度差异是一个非常显著的数字,表明推理模型与非推理模型之间存在明显的性能差距。这个倍数关系暗示了架构变化可能带来的性能飞跃,而非简单的线性改进。这一数据点支持了推理能力可能是AI进步关键驱动力的假设。
Three of the four metrics (ECI, log METR 50% time horizon, and a math-focused index we constructed from several math benchmarks) show strong evidence that progress has sped up relative to a global linear trend fit to data from 2023 onward.
这个数据点表明75%的AI能力指标显示加速趋势,这是一个相当高的比例。文章提到这种加速始于2023年,与推理模型的出现时间吻合。这个比例值得注意,因为它表明AI进步可能正在经历一个质的转变,而非仅仅是量的累积。
The three metrics where we find acceleration are concentrated in programming and mathematics. These are areas that labs have explicitly targeted for improvement
这个观察揭示了AI能力加速的领域局限性。编程和数学领域的加速可能是因为这些领域被明确作为改进目标,且正确性容易验证。这表明AI进步可能是有选择性的,而非全面性的,对评估整体AI进展有重要启示。
Our fourth metric, an index constructed from WeirdML V2 results, showed no sign of acceleration. A single global linear trend fit the data best.
这个25%的指标没有显示加速现象,表明AI能力加速可能不是普遍适用的。WeirdML V2的特殊环境(资源受限、无外部工具)可能解释了这一差异,但也暗示了AI能力加速可能集中在特定领域,特别是那些容易自动验证正确性的领域。
The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and one for non-reasoning models.
这个发现表明推理模型和非推理模型的发展轨迹确实存在显著差异。这种分离的线性趋势模型在三个指标上表现最佳,100%的情况下优于其他模型,提供了强有力的统计证据支持AI能力加速的论点。
Reasoning models show both a one-off jump in performance and a roughly 2-3x faster trend compared to non-reasoning models.
这个2-3倍的速度差异是显著的,表明推理模型带来了质的飞跃。这种加速幅度远高于典型的技术进步速度,暗示了AI发展可能进入了一个新阶段。然而,这个倍数范围较宽,缺乏精确的统计显著性检验。
Three of four metrics show strong evidence of acceleration, driven by reasoning models.
这是一个关键数据点,表明75%的AI能力指标显示加速趋势。这个比例相当高,表明AI能力加速现象可能不是偶然的。然而,这个数据基于四个特定指标,可能不全面代表所有AI能力领域。需要更多指标验证这一结论的普适性。
The three metrics where we find acceleration are concentrated in programming and mathematics.
文章明确指出显示加速的三个指标主要集中在编程和数学领域。这是一个重要的限制,因为正确性在这些领域容易自动验证,使它们成为强化学习的自然目标。这表明AI能力的加速可能不适用于所有领域,特别是在那些难以自动验证正确性的任务上。
We select the median-difficulty question from the set with maximum model coverage and standardize it to 0.
在构建数学指数时,研究人员选择具有最大模型覆盖率的集合中的中等难度问题,并将其标准化为0。这是一个关键的统计处理步骤,用于确保不同难度和评分的基准测试可以放在同一尺度上比较。这种标准化方法使得不同模型的表现可以直接比较。
We work with the natural logarithm of the time horizon, which puts it on an approximately linear scale.
文章提到对METR时间范围进行自然对数转换,使其处于近似线性尺度。这种数学转换表明原始数据可能呈指数增长,转换后才能更好地分析线性趋势。这种处理方式在分析AI进步率时很常见,因为它能更好地处理跨越多个数量级的数据。
The minimum training cutoffs are: ECI (June 2024), METR Time Horizon (January 2024), Combined Math (September 2024), and WeirdML V2 (January 2025).
这些时间节点显示了各数据集的最小训练截止点,时间跨度从2024年1月到2025年1月。值得注意的是,WeirdML V2的数据集最短(从2025年1月开始),这可能解释了为什么该指标没有显示出加速趋势,因为数据不足以检测到趋势变化。
Reasoning models show both a one-off jump in performance and a roughly 2-3x faster trend compared to non-reasoning models.
推理模型比非推理模型显示出2-3倍的性能提升速度,这是一个显著的增长率差异。这个倍数差异表明推理模型的引入可能代表了AI发展的一个重要转折点。然而,文章也指出无法确定精确的增长率,因为多种非线性拟合都能很好地解释数据。
Three of four metrics show strong evidence of acceleration, driven by reasoning models.
这一数据点表明75%的AI能力指标显示加速趋势,这是一个相当高的比例。然而,文章也指出第四个指标(WeirdML V2)没有显示加速,这表明加速可能并非普遍存在于所有AI能力领域。这个比例需要谨慎解读,因为它基于有限的四个指标,且主要集中在数学和编程领域。
Unfortunately, the attacker got further access through their enumeration.
大多数人认为环境变量即使不敏感也难以被利用,但作者指出攻击者通过枚举这些变量获得了进一步访问权限,这挑战了'非敏感数据不值得保护'的常见观念,暗示即使是看似无害的数据也可能成为攻击链的一部分。
Vercel stores all customer environment variables fully encrypted at rest. We have numerous defense-in-depth mechanisms to protect core systems and customer data.
大多数人认为云服务提供商的所有数据都会自动加密保护,但作者指出Vercel实际上允许将环境变量标记为'非敏感',这意味着这些变量默认不加密,这与行业普遍认为的'云数据自动加密'的常识相悖。
SWE-chat is a living dataset; our collection pipeline automatically and continually discovers and processes sessions from public repositories
大多数人认为AI研究数据集是静态的、一次性的收集,但作者提出'活数据集'概念,强调数据需要持续更新才能反映真实使用情况。这挑战了传统AI评估中依赖静态基准测试的做法,主张需要动态、持续的数据收集方法。
frontier AI models are not too big because the technology is complex and too big because the training data is garbage
这一观点挑战了当前AI模型规模扩大的主流解释,将问题从技术复杂性转向数据质量问题,提出了一个反直觉的视角:模型规模实际上是应对低质量数据的必要之举,而非技术发展的必然结果。
The modern data stack has undergone a decade+ transition from disparate data sources to consolidated data and cleaned definitions (which is good), but even then the consolidation is never perfect and a lot of messiness is introduced.
这一观察揭示了现代数据栈的悖论:尽管数据整合和清理取得了进展,但完美整合是不可能的,数据混乱仍然存在。这挑战了数据整合就能解决所有问题的假设,强调了持续管理的重要性。
To overcome this blocker, a team member hard codes the exact revenue and timeframe definitions. The data agent continues chugging along but quickly runs into challenge #2 – where are the right data sources? Which ones are the right sources of truth?
这个具体案例生动展示了数据代理面临的现实困境:即使解决了业务定义问题,数据源的真实性和可靠性问题仍然存在。这揭示了企业数据治理的复杂性,以及简单技术解决方案的局限性。
Over the past year, the market has realized that data and analytics agents are essentially useless without the right context – they aren't able to tease apart vague questions, decipher business definitions, and reason across disparate data effectively.
这一观点揭示了当前AI数据代理的核心困境:缺乏上下文理解能力导致其无法有效处理复杂业务问题。这挑战了单纯依赖模型能力就能解决所有数据推理问题的假设,强调了业务语义理解的重要性。
包括 47 种数据:agents、skills、hooks、MCP 配置、会话历史、自定义规则……
Claude数据类型超预期 用户数据复杂度远超简单对话,包含大量配置和状态信息,增加了迁移的技术难度和价值。
多年积累的对话、定制 Agent、项目记忆、MCP 配置、Skill 库——一次风控就可能全部失联。
用户数据风险被低估 Claude用户资产价值远超预期,但官方缺乏备份机制,数据安全完全依赖单一平台稳定性。
They provide access to more than 50 public multi-omics databases, literature sources, and biology tools, and offer a flexible starting point for common repeatable workflows.
整合50多个多组学数据库的能力代表了AI在科学数据整合方面的突破。这种大规模数据访问可能消除传统研究中的信息孤岛,但同时也引发了数据质量和代表性的重要问题。
Some privacy related extensions may cause issues on x.com.
这句话暗示了隐私保护工具与主流社交平台之间的潜在冲突。这反映了数字隐私与平台商业利益之间的张力。用户安装隐私扩展通常是为了保护数据不被收集,但平台可能将这些工具视为干扰其数据收集和分析的障碍。这种冲突预示着未来网络环境中隐私保护与平台功能之间的持续博弈。
Some privacy related extensions may cause issues on x.com.
这是一个令人惊讶的声明,暗示社交媒体平台可能主动阻止用户使用隐私保护工具。这可能表明X平台的数据收集策略与用户隐私保护之间存在根本冲突,值得深入研究其商业模式与用户权利的平衡问题。
There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear
这个数据揭示了软件开发的指数级增长趋势,暗示AI辅助编程工具可能面临前所未有的需求激增,这将重塑软件工程领域的经济模型和人才需求结构。
We present a comprehensive adoption snapshot of the leading open language models and who is building them, focusing on the ~1.5K mainline open models
报告对约1500个主流开源模型进行全面分析,这种规模的数据收集为理解开源AI生态系统提供了前所未有的宏观视角。这种系统性的测量方法可能成为评估AI发展轨迹的重要基准。
Sage sends URLs and package hashes to Gen Digital reputation APIs. File content, commands, and source code stay local.
这个隐私声明揭示了Sage的数据处理策略,采用了最小化数据传输的设计哲学。这种平衡安全与隐私的做法很有洞察力,表明开发者理解用户对数据泄露的担忧,同时认识到某些云端分析对于有效威胁检测的必要性。
Academic publishers, documentary archives, game studios, and companies sitting on years of enterprise data have all been courted for the seeds of intelligence needed to train the next generation of models.
AI训练数据市场的扩张正在重塑多个传统行业的价值定位,从学术出版到游戏工作室,各种看似不相关的数据源都可能成为AI训练的'智能种子'。这种跨行业数据融合正在创造新的商业机会和市场动态。
Mercor, which provides data to AI labs for training, became one of the fastest-growing companies in history before losing four terabytes of data to hackers last week.
Mercor的快速崛起与数据泄露事件形成了鲜明对比,凸显了数据安全在AI训练中的关键地位。这一事件可能引发行业对数据安全和隐私保护的重新审视,促使AI公司建立更严格的数据管理标准。
A small model trained on fewer than 2,000 examples from real lawyers, bankers, and consultants recently beat all but the best frontier models on corporate legal work, at a fraction of the price.
这一发现挑战了'规模和计算能力胜过一切'的AI发展范式。高质量专业化数据训练的小型模型在特定领域表现优于通用大模型,暗示AI发展可能从'越大越好'转向'更专业、更高效'的新阶段。
Reddit, Shutterstock, and News Corp are making hundreds of millions a year licensing their high-quality data to companies training AI, and those contracts are growing about 20 percent annually, according to their quarterly filings.
这一数据揭示了AI训练数据市场的巨大经济价值,表明高质量数据已成为AI公司的战略资产。传统内容公司正在转型为AI的'输入公司',这种转变不仅改变了他们的商业模式,也重新定义了数据在AI生态系统中的核心地位。
Our Chip Ownership data does not capture all global chip ownership, and has weaker coverage prior to 2023.
数据覆盖范围的限制意味着我们对全球算力分布的理解存在盲点,特别是在2023年之前的时期和未被充分记录的地区。这种不完整性可能导致对算力集中趋势的过度解读,忽视了其他参与者可能发挥的更大作用。
As slop takes over the Internet, labs may struggle to obtain high-quality corpuses for training models.
这一观察揭示了AI训练数据质量的危机。随着互联网内容质量的下降,AI系统可能面临'垃圾进,垃圾出'的风险。作者提出的'低背景钢'比喻巧妙地指出了使用2023年前纯净数据的解决方案,同时也暗示了数字时代知识污染的严重性,这可能会对AI系统的可靠性和偏见产生深远影响。
Based on our analysis, **29% of the Fortune 500 and ~19% of the Global 2000**are live, paying customers of a leading AI startup.
这一数据揭示了企业AI采用率远高于公众认知,颠覆了传统技术采用模式。财富500强中近三分之一的企业已经实际部署AI应用,这一惊人的采用速度表明AI技术正在以前所未有的速度渗透传统企业,打破了企业技术采用通常需要数年才能达到大规模采用的规律。
Support teams are high volume and high turnover, and thus need to train new reps in a fast and standardized way. To do so, they have clearly articulated standard operating procedures (SOPs) that guide the work of each rep. These SOPs create clear rules and guidelines that AI agents can model themselves off of.
AI 在客服领域成功的秘密竟然是:这个行业为了管理人类员工的高流失率,被迫建立了极其清晰的 SOP 文档——而这恰好是训练 AI Agent 的完美素材。这是一个意外的历史巧合:企业因为人类问题(高离职率)被迫文档化了所有流程,然后 AI 来了,直接把这些文档变成了自己的「培训手册」。低价值工作被最彻底地文档化,反而最容易被 AI 替代。
Closed harnesses behind proprietary APIs force yielding control of agent memory to third parties.
令人惊讶的是:专有API背后的封闭式代理工具迫使用户将代理记忆的控制权让渡给第三方。这意味着用户在使用AI代理时可能不知不觉地失去了对自己数据和个人信息的控制权,这可能引发严重的隐私和安全问题。
Within a few months, they have more than a dozen production enterprise deployments & are processing over a billion events per hour.
令人惊讶的是:Artemis安全公司在短短几个月内就处理了每小时超过10亿个安全事件,这种数据处理规模反映了现代企业面临的网络安全威胁的惊人频率和复杂性。
Maine advances first statewide moratorium blocking data centers requiring over 20 megawatts
令人惊讶的是:缅因州将成为美国第一个全范围禁止大型数据中心建设的州,这一政策针对的是超过20兆瓦的数据中心设施,这在科技发展迅速的今天显得格外独特和出人意料。
the most interesting detail here is how SkillClaw clusters cross-user trajectories into referenced skills and then uses the evolver to translate those patterns into concrete updates.
令人惊讶的是:SkillClaw能够将跨用户轨迹聚类为参考技能,然后使用进化器将这些模式转化为具体更新。这种处理异构用户经验的方法非常巧妙,它不仅解决了不同用户间信号差异的问题,还能从看似无关的用户行为中提取有价值的模式,实现真正的集体智慧。
We test for a trend over time by fitting a weighted linear model to the log-odds of usage. Under this specification, Claude is the only AI service in the survey to show a statistically significant upward trend over this period
令人惊讶的是:研究团队使用了对数几率加权线性模型来分析趋势,发现Claude是唯一一个在统计上显示出显著增长趋势的AI服务。这种复杂的统计分析方法揭示了表面上微小变化背后的真实趋势。
The ChatGPT for Excel add-in operates separately from your ChatGPT chat history. Conversations and data in Excel aren't shared with your ChatGPT chats, and activity doesn't sync between experiences at this time.
令人惊讶的是:Excel中的ChatGPT功能与普通聊天历史是完全隔离的,两个系统之间没有数据同步。这意味着用户可以在Excel中使用AI处理敏感数据,而不用担心这些信息会出现在他们的常规聊天记录中,提供了额外的隐私保护层。
By default, data shared with ChatGPT isn't used to improve our models for ChatGPT Business, ChatGPT Enterprise, ChatGPT Edu, and ChatGPT for Teachers.
令人惊讶的是:企业级用户的Excel数据默认不会被用于训练AI模型,这与普通用户的数据处理方式有显著区别。这种差异反映了OpenAI对商业客户隐私的特别保护,可能是为了增强企业采用AI工具的信心。
we collaborated with over 1,000 physicians to curate training data that enables more factual and comprehensive responses.
令人惊讶的是:为了提升Muse Spark在健康领域的推理能力,Meta竟然与超过1000名医生合作来筛选训练数据。这种规模的专家参与在AI模型开发中极为罕见,显示了Meta对医疗健康领域准确性的高度重视,也反映了AI模型专业化训练的新趋势。
The model reportedly scored 93.9% on SWE-bench Verified and 77.8% on SWE-bench Pro, but its strongest signal came from real-world results, including uncovering a 27-year-old flaw in OpenBSD, a 16-year-old vulnerability in FFmpeg, and autonomously chaining Linux kernel exploits without human input.
这些惊人的安全漏洞发现能力表明AI已经超越了传统安全工具,能够自主发现几十年未被发现的漏洞。特别是能够自主链接Linux内核漏洞的能力,展示了AI在网络安全领域的革命性潜力,这可能彻底改变安全研究和漏洞修复的方式。
We need, like, a Manhattan Project to collect this
经济学家呼吁以“曼哈顿计划”的规模来收集各行业价格弹性数据,凸显了当前AI经济研究的底层基础设施极度匮乏。没有跨经济体的系统性微观数据支撑,任何关于AI就业前景的预测都是盲人摸象,政策制定更是无从谈起。
We need, like, a Manhattan Project to collect this... Fields that are not exposed now will become exposed in the future, so you just want to track these statistics across the entire economy.
大多数人认为应对AI就业影响应该专注于当前受威胁最大的行业,但作者认为我们需要像曼哈顿计划一样全面收集所有行业的价格弹性数据,包括目前尚未受到AI影响的领域。这种前瞻性视角挑战了危机应对的常规思维。
A learning system can continuously incorporate real-world data in a way that numerical solvers fundamentally cannot, capturing and compounding the knowledge that is currently trapped out there in the real world.
揭示了AI驱动设计的另一大优势:打通仿真与现实的闭环。传统求解器难以穷尽制造公差等现实复杂因素,而学习系统能持续吸收实测数据,形成越用越聪明的“数据飞轮”。将现实中散落的隐性知识固化为模型能力,这是传统工具无法企及的质变。
inappropriately change or overwrite JSON files compared to Markdown files
这是一个极具洞察力的工程经验。Markdown格式对LLM来说太“自由”,易被模型篡改或幻觉覆盖;而JSON具有严格的Schema约束。选择合适的数据格式本身就是一种隐式的Prompt防护栏。
按时间记录不完全合理,还是应该按任务记录。
这一观点挑战了传统时间轴记录的惯性思维。时间轴看似客观,实则碎片化,增加了认知负担。以 Task 为核心组织记忆,实际上是模拟人类大脑的联想记忆机制,将散乱的行为建模为有序的因果关系,极大提升了信息的召回效率和应用价值。
βテスト期間中のご利用は無料です。
Beta 期间完全免费——对于一个声称能替代 CSO 团队数周工作的产品来说,这个策略令人惊讶。背后的逻辑是:Sakana 需要真实的企业级研究任务作为训练数据和案例积累,而这些数据只有企业用户才能提供。「用免费换真实场景数据」是 AI 产品冷启动的经典策略,但在如此高端的 B2B 定位下使用,意味着 Sakana 对自己产品当前状态的坦诚:它还不够好到让企业为初版买单,但已经足够好到值得企业免费试用。
American hyperscalers are driving a data center buildout that's larger than the Manhattan Project and Apollo Program at their peaks.
将美国 AI 数据中心建设规模与曼哈顿计划和阿波罗计划的峰值相比——这个类比既令人震惊,又揭示了竞争的本质已从技术竞争升级为「工业动员」。曼哈顿计划是战时国家意志的总动员,阿波罗计划是冷战荣耀的象征投入。如今的 AI 算力竞赛,在绝对体量上已超越这两个历史上最大规模的科技工程——而这场竞赛还远未触及天花板。
MiniMax may have been able to get 100 billion tokens of data from interactions with Claude.
100 亿 token 的 Claude 交互数据——这个估算令人瞠目。这意味着 MiniMax 的用户在不知情的情况下,可能成了为 Claude 蒸馏数据的「采集器」。从 Anthropic 的角度看,这是商业数据被盗用;从竞争视角看,这说明 API 开放策略本身就是一把双刃剑——越开放,越容易被「逆向汲取」。
A three-stage progressive training strategy -- large-scale pre-training, hard sample fine-tuning, and GRPO alignment -- sequentially exploits these data at different quality tiers.
大多数人认为训练策略应该统一应用于所有数据,但作者提出了分阶段渐进式训练策略,在不同质量层级的数据上采用不同方法,这种针对数据质量差异的训练方法挑战了传统'一刀切'的训练范式,代表了数据为中心的AI新思路。
SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.
大多数人认为不同架构的模型会有不同的失败模式和弱点,但作者发现无论架构和参数规模如何,SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明性能瓶颈源于训练数据的共同缺陷,而非架构差异,这一发现挑战了模型多样化的传统观点。
Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods including models with over 200× more parameters.
大多数人认为更大的模型架构必然带来性能提升,但作者仅通过数据工程和训练策略优化,在保持1.2B参数架构不变的情况下,超越了参数量超过200倍的现有模型,这挑战了'越大越好'的行业共识,证明了数据质量的重要性。
Current document parsing methods compete primarily on model architecture innovation, while systematic engineering of training data remains underexplored.
大多数人认为文档解析性能的提升主要依赖于模型架构的创新和规模的扩大,但作者认为训练数据的系统性工程优化才是关键瓶颈,因为不同架构的SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明问题在于数据质量而非架构本身。
introducing a commercial text and data mining exception for AI training would expand the AI sector in the country.
大多数人认为放宽数据挖掘限制会促进AI创新和增长,但作者认为这种例外实际上不会扩大AI产业。这一观点与科技行业普遍倡导的'更多数据等于更好AI'的信念相悖,挑战了数据自由流动的主流叙事。
most existing large language model agent systems face severe limitations in data-intensive settings, including context saturation, cascading error propagation, and high end-to-end latency
主流观点认为大型语言模型代理系统在处理复杂数据任务时表现出色,但作者指出它们在数据密集型环境中存在严重局限性,挑战了LLM代理系统的普遍有效性假设。
We introduce Iterative Reward Calibration, a methodology for designing per-turn rewards using empirical discriminative analysis of rollout data
大多数人认为奖励设计应基于领域专家知识和预定义规则,但作者提出应基于实际训练数据的经验判别分析来迭代校准奖励。这种方法挑战了传统的奖励工程方法论,将奖励设计从'专家驱动'转向'数据驱动'。
If we knew that every image uploaded was a beautiful model shot, segmentation would be far easier, but because of the nature of user-uploaded content, we need the best possible segmentation.
大多数人可能认为高质量的专业照片是AI图像处理的理想输入,但作者暗示即使是'完美'的模特照片实际上比用户上传的真实内容更容易处理。这一观点挑战了人们对'理想训练数据'的假设,暗示真实世界数据的'不完美'实际上构成了更严峻的技术挑战。
Urgent treatment for neoplasm consists of (1) cautious use of intravenous diuretics and (2) mediastinal irradiation, starting within 24 hours, with a treatment plan designed to give a high daily dose of radiation but a short total course of therapy to rapidly shrink the local tumor. Intensive radiation therapy combined with chemotherapy will palliate the process in up to 90% of patients. In patients with a subacute presentation, radiation therapy alone usually suffices. Chemotherapy is added if lymphoma or small-cell carcinoma is diagnosed
endovascular stenting emerging as first-line therapy for rapid symptom relief, while definitive treatment targets the underlying cause
Interviews were video and audio recorded. We transcribed the audio using OpenAI's Whisper automatic speech recognition system and anonymized the transcript before analysis. We analyzed the interview data using thematic analysis [1]. First, two members of the research team independently coded four (25% of collected data) randomly chosen participant data to generate low-level codes. The inter-coder reliability between the coders was 0.88 using Krippendorff's alpha [37]. The two coders then met together to cross-check, resolve coding conflicts, and consolidate the codes into a codebook across two sessions. Using the codebook, the two coders analyzed six randomly selected participant data each. The research team then met, discussed the analysis outcomes, and finalized themes over three sessions.
sentence describing how analysis was performed on data collected by the authors of this paper
We conducted a qualitative analysis of user study transcripts and survey responses using a Grounded Theory approach [8]. First, the lead researcher collected a list of participants' behaviors, approaches, reflections on their experience, and feedback about the interface. The researcher then systematically coded this data, revisiting the data multiples times and refining the codes to ensure consistency and coherence. Through this process, high-level themes were identified and organized using affinity diagramming. Once the thematic structure was finalized, the researcher gathered supporting evidence for each theme and synthesized the findings, which were reviewed by the research team to ensure agreement on the results.
sentence describing how analysis was performed on data collected by the authors of this paper
Activity log data, which revealed how participants actually used the interface, echoed the above findings. According to the log data, participants spent most of their reading time (66.31%) with vertical alignment on the second element in structure pairs, followed by alignment on the first element (29.19%), and left-justified alignment (5.13%). Highlighting usage showed a similar preference: 91.13% of time with all chunks highlighted, 8.25% with partial highlighting, and minimal time (0.63%) without highlights.
sentence describing how analysis was performed on data collected by the authors of this paper
In this section, we present findings on how AbstractExplorer supports comparative close reading at scale by integrating quantitative survey responses and log data with qualitative analysis of transcripts and open-ended responses. The qualitative analysis process is described in detail in Appendix H.
sentence describing how analysis was performed on data collected by the authors of this paper
Throughout the two tasks, we also collected detailed interaction logs including counts of user-defined aspects created, duration of highlighting usage, and time allocation across the three possible alignment options.
sentence describing how analysis was performed on data collected by the authors of this paper
Both gaze data and the semi-structured interviews revealed that lower NFC participants were more willing to be guided by the three features and took advantage of them consciously.
sentence describing how analysis was performed on data collected by the authors of this paper
Using a two-tailed Mann-Whitney U Test, we found that participants who reported their lowest perceived cognitive load when all three features were enabled had significantly lower NFC than participants who reported their lowest cognitive load level when skimming with no features enabled—in the baseline interface (p=0.03).
sentence describing how analysis was performed on data collected by the authors of this paper
The raw NASA-TLX score is the sum of all 6 NASA-TLX questions after reversing the appropriate questions.
sentence describing how analysis was performed on data collected by the authors of this paper
To compute a participant's NFC score, we averaged their response to the six questions, each ranging from 1 to 7, after reversing the appropriate questions.
sentence describing how analysis was performed on data collected by the authors of this paper
For simplicity of analysis, we denote participants with NFC scores above the overall participants' median NFC of 5.42 (IQR = 0.583) as higher NFC, and lower NFC otherwise.
sentence describing how analysis was performed on data collected by the authors of this paper
To contrast participants' gaze patterns in each condition, we used a Tobii Pro Spark eye-tracker placed below the desktop monitor used by all subjects; Tobii Pro Lab software recorded each participant's gaze over time in each condition.
sentence describing how analysis was performed on data collected by the authors of this paper
We collected 80 sentences from our abstracts dataset labeled by our system as "Methodology/Contribution." Participants viewed the same 80 sentences in each condition—often with a different subset of sentences initially visible due to ordering changes—but only had two minutes to look at them in each condition.
sentence describing how analysis was performed on data collected by the authors of this paper
After obtaining an expanded set of high-level chunk labels, we assign them to each of the sentence chunks by using LLMs in a multiclass classification few-shot learning task, with the initial labels and assignment as examples (see prompt used in Appendix D.3).
sentence describing how analysis was performed on data collected by the authors of this paper
Then, we segment sentences within each aspect into grammarpreserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.
sentence describing how analysis was performed on data collected by the authors of this paper
We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.
sentence describing how analysis was performed on data collected by the authors of this paper
After the interviews, we analyzed the data using the process described in Appendix B
sentence describing how analysis was performed on data collected by the authors of this paper
To analyze the annotation efficiency, we first conducted a Kruskal-Wallis rank sum test [39] to determine if there were statistically significant differences in annotation time across the three conditions, because our data violated the homogeneity of variances assumption, making non-parametric methods more appropriate.
return any single sentence that describes data analysis done on data collected by the authors when running human subjects experiments.
Figure 3: Decrease in Perceived Fiscal Benefits of Microcredentials by Year
Perception vs Reality disconnect. Why? My guess is huge gaps in understanding of the Fundamentals.
Notably, traditional mindsets and legacysystems are seen as far greater barriers in 2025 (61%) than in 2021 (5%), high-lighting a growing tension between innovation and institutional resistance tochange.
In 2021, the question would have been interpreted differently. 61% is probably still lower than it should be.
Comprehensive Learner Record Standard Transcript Guide
CLR Playbook
The Observatory of Economic Complexity (OEC)<br /> https://oec.world/en
23
23 See: Wikidata:Data round-tripping, https://www.wikidata.org/w/index.php?title=Wikidata:Data_round-tripping&oldid=2440906511
If you value your data I suggest not trusting any filesystem or media, consider them all equally fallible.
Media files are not directly downloaded in overall syncing to save bandwidth. Instead, when that file is requested, it is streamed to your device from the backup node or your devices on the network. For example, if you have a 4K Video, it will be streamed from the backup node or P2P devices to your device. So when you open an object with an image, it downloads. When you press play on video & audio, it begins to download. After that, this file will be stored in the application cache.
media files may not be locally available, and require a internet connection to be streamed/downloaded on demand. Generally excluded from syncing to save bandwidth. Doesn't this also mean that media files aren't backed-up, in the sense that people will treat sync as back-ups.
Benioff had recently told Business Insider that he's drafting the company's annual strategic document with data foundations—not AI models—as the top priority, explicitly citing concerns about "hallucinations" without proper data context.
The annual strategic document now puts data foundations in focus, not AI models. Well duh. How even get to the notion that you can AI-all the things, it implies an uncritical belief in the promises of vendors, or magical thinking. How do you get to be CEO if you fall for that. Vibe-leading iow, the wizard behind the curtain.
our world and data does they do have some legitimate research because that's what think tanks do. They launder illegitimate research with legitimate research. uh and their tactic primarily is to uh set the scope of what they are commenting on or researching uh that it you know it puts forward the kind of results that they want uh that aligns with their ideology.
for - Our World in Data - discredited website - mix legitimate with illegitimate research to advance a biased ideology
Micro-Credentials and Digital Badges: An Exploration of Definitions and Implications in Higher Education and Workforce
for - open source - world population data - Worldpop
Systolic blood pressure.
Gender and grade level had no effect on systolic BP recovery. The only major-related difference was that landscape/environmental majors showed a significantly larger SBP reduction than non-environment majors when viewing desert landscapes, but not for any other landscape type.
3.1 Physiological response of viewing different landscape types
This study shows that visual exposure to natural environments, especially forests and water, produces measurable physiological relaxation: • nature images lower systolic BP • forest images lower diastolic BP • water images lower HR Suggests that different types of natural scenes have different calming effects, and body overall responds physiologically to nature in ways that promote relaxation and reduce stress.
How important for being looked up to or having high status in your school is...
missing data
How often do you feel...
missing data
How much competition for grades is there�
weird amount of missing data again
During an average school week, about how many times�
Item05a-b have a weird amount of missing data
NeverLess than once a week1-2 times a week3-5 times a week6-9 times a week10-19 times a week20 or more
Maybe redo this graph so that the color legend isn't so large and the questions don't take up so much space.
I feel I am a person of worth, on an equal plane with others
Item01b has a weird amount of missing data go back and check the data management
Synthèse du MIPEX 2025 : Politiques d'Intégration en France
L'analyse des politiques d'intégration de la France dans le cadre du Migrant Integration Policy Index (MIPEX) 2025 révèle un tableau contrasté.
Avec un score global de 56 sur 100, la France se positionne à mi-chemin, appliquant des politiques qui offrent des opportunités mais aussi des obstacles significatifs à l'intégration.
Cette note, inchangée depuis 2019, masque des évolutions divergentes :
des progrès notables dans le domaine de l'éducation sont contrebalancés par des reculs en matière d'accès aux soins de santé et de résidence permanente.
L'approche française est classée comme "Intégration Temporaire", un modèle qui accorde des droits fondamentaux aux citoyens non-européens mais leur refuse la sécurité à long terme nécessaire pour s'établir durablement et participer pleinement à la vie citoyenne.
Les points forts de la France résident dans son cadre législatif solide en matière de lutte contre les discriminations et dans les récentes améliorations de l'accès à l'enseignement supérieur.
Cependant, ces avancées sont minées par des politiques restrictives concernant la résidence permanente, le regroupement familial et un processus d'accès à la nationalité jugé discrétionnaire et politisé.
La loi "Immigration & Intégration" de janvier 2024 et les décrets d'application subséquents marquent un tournant vers une approche plus sélective et exigeante, renforçant les exigences linguistiques et civiques.
Pour améliorer son modèle, il est recommandé à la France d'adopter une approche plus cohérente, alignant ses politiques sur un objectif d'intégration à long terme et traitant les immigrés comme de futurs citoyens plutôt que comme des résidents temporaires.
Avec un score de 56 sur 100, les politiques d'intégration de la France sont jugées "à mi-chemin" (halfway to promote societal integration).
Ce score place la France dans la catégorie de l'"Intégration Temporaire". Selon la typologie du MIPEX, ce modèle se caractérise par :
• L'octroi de droits fondamentaux et de certaines mesures favorisant l'égalité des chances.
• Le refus de la sécurité à long terme indispensable pour s'installer de manière permanente, investir dans l'intégration et participer pleinement en tant que citoyen.
• La perpétuation d'une perception des immigrés comme étant partiellement égaux, mais restant fondamentalement des étrangers (outsiders).
Cette approche contraste avec celle des pays du "Top Ten" du MIPEX, qui traitent les immigrés comme des égaux, des voisins et des citoyens potentiels, investissant dans l'intégration comme un processus mutuel bénéfique pour l'ensemble de la société.
Le score global de la France est stable depuis 2019, mais cette stabilité cache des changements contradictoires dans différents domaines politiques.
Changements Positifs :
• Accès à l'enseignement supérieur : Des programmes ciblés ont été mis en place pour améliorer l'accès des migrants à l'enseignement supérieur.
• Intégration dans le corps enseignant : Des initiatives soutiennent l'intégration des migrants dans la profession d'enseignant.
• Projets spécifiques :
◦ AIMES+ (depuis 2023) : Vise à améliorer la qualité des cours de français pour les étudiants immigrés.
◦ L'Université en Exil (UXIL) : Offre un parcours académique aux étudiants et chercheurs en exil.
Changements Négatifs :
• Résidence permanente : Les conditions de renouvellement du statut de résident permanent ont été durcies, notamment par la réduction des périodes d'absence autorisées hors du territoire français.
• Accès aux soins de santé (depuis 2020) : Les demandeurs d'asile et les immigrés non-européens font face à des obstacles accrus, avec des conditions supplémentaires et des délais d'attente plus longs pour la couverture santé.
Un changement juridique clé en 2019 a introduit un délai de carence de trois mois et une condition de résidence minimale pour l'éligibilité à la Protection Universelle Maladie (PUMa).
• Loi "Immigration & Intégration" (janvier 2024) : Cette loi, dont le score n'est pas encore intégré au MIPEX, a centralisé et renforcé les exigences en matière de langue, de civisme et d'emploi.
Elle introduit des limites au renouvellement des titres de séjour temporaires et des tests de langue et de valeurs plus stricts pour la résidence et la citoyenneté.
Les décrets et circulaires de mi-2024 et début 2025 ont activé ce cadre, augmentant la pression administrative et les obligations d'intégration.
Domaine Politique
Classification MIPEX
Résumé des Constatations
Mobilité sur le Marché du Travail
Halfway favourable (Moyennement favorable)
Les résidents permanents et les familles ont accès au marché du travail, mais sont exclus de plus de professions réglementées que dans tout autre pays.
Les nouveaux arrivants ont accès aux services généraux d'emploi mais souvent pas à la reconnaissance de leurs diplômes ou à des bourses d'études.
Regroupement Familial
Halfway favourable (Moyennement favorable)
Les exigences (économiques, logement) sont strictes et le processus peut être long et discrétionnaire.
Cependant, une fois réunies, les familles bénéficient de droits socio-économiques égaux et d'un soutien à l'intégration, avec une augmentation des heures de cours de langue (jusqu'à 400h, et 600h pour les personnes analphabètes).
Éducation
Halfway favourable (Moyennement favorable)
La France a renforcé son soutien, notamment via des programmes ciblés depuis 2015 (AIMES+, UXIL).
Tous les élèves, quel que soit leur statut, ont les mêmes droits à l'éducation.
Le point faible reste l'absence de valorisation de la diversité dans l'éducation à la citoyenneté.
Santé
Slightly favourable (Légèrement favorable)
Le système de santé est inclusif, mais il ne répond que faiblement aux besoins spécifiques des patients migrants.
Depuis 2020, les barrières à l'accès se sont renforcées pour les demandeurs d'asile et les immigrés non-UE (conditions plus strictes, délais d'attente allongés).
Participation Politique
Halfway favourable (Moyennement favorable)
Les étrangers sont peu informés et consultés par les autorités.
La France est l'un des rares grands pays de destination sans droit de vote local pour les étrangers.
Une consultation accrue des groupes de réfugiés est notée au niveau national depuis 2018.
Résidence Permanente
Halfway favourable (Moyennement favorable)
L'accès au statut sécurisé de 10 ans est conditionné par des exigences linguistiques, d'intégration et parfois économiques parmi les plus restrictives.
Bien que le statut lui-même soit protecteur, il est très difficile à obtenir et à renouveler (notamment depuis 2024).
Accès à la Nationalité
Slightly favourable (Légèrement favorable)
Le parcours est similaire à d'autres pays occidentaux (5 ans de résidence, double nationalité possible).
Cependant, le processus est de plus en plus politisé, discrétionnaire et décourageant pour certains candidats.
Les exigences strictes (stabilité financière, niveau B1 en langue, entretien d'assimilation subjectif) constituent des barrières importantes.
Antidiscrimination
Slightly favourable (Légèrement favorable)
Il s'agit du plus grand point fort de la France en matière d'intégration.
La législation est solide et l'organe de défense (Défenseur des Droits) est efficace pour informer le public et aider les victimes.
Ces politiques semblent avoir eu un impact positif à long terme sur les mentalités publiques en Europe.
Le modèle d'intégration français est marqué par une incohérence fondamentale :
ses forces reconnues en matière de lutte contre la discrimination et
ses progrès dans l'éducation sont sapés par une approche restrictive et précaire concernant les piliers de l'intégration à long terme que sont la résidence, la famille et la nationalité.
La loi de 2024, les nouvelles instructions préfectorales sur la naturalisation (mai 2025) et une proposition de 2024 remettant en cause le droit du sol témoignent d'un changement de discours vers des politiques d'intégration plus exclusives.
Pour renforcer son modèle, la France devrait :
1. Adopter une Approche Cohérente : Aligner les politiques restrictives de résidence et de regroupement familial sur ses mesures plus inclusives en matière d'éducation et d'antidiscrimination.
2. Sécuriser les Parcours d'Intégration : Réduire le caractère discrétionnaire et les exigences excessives dans les procédures d'accès à la résidence permanente et à la nationalité pour offrir la stabilité nécessaire à une intégration réussie.
3. Traiter les Immigrés comme de Futurs Citoyens : Mettre en œuvre une vision de l'intégration comme un processus à double sens qui renforce la confiance mutuelle et bénéficie à l'ensemble de la société.
Comme le démontrent 130 études scientifiques indépendantes utilisant les données du MIPEX, la manière dont les gouvernements traitent les immigrés est un facteur déterminant qui influence non seulement l'acceptation par le public, mais aussi le sentiment d'appartenance, la participation et même la santé des immigrés dans leur nouveau pays.
I will not link to the photo here. Allow me my parental illusions of protection. I reprint it here on social media only in doctored form. I didn’t know what to use to cover her eyes.
Responsible parenting
During each call, Stewart said, Amazon officials have not been helpful."They wanted to do background checks on all my firefighters; I wouldn't let them," he said. "And we've struggled to gain access to emergencies. They'll stop us at the gate, and our medic units have been delayed. They're denying us access to patients.
"How AI Datacenters Eat the World" from High Yield on YouTube. 30-Aug-2025
HighYield x SemiAnalysis deep-dive into AI Datacenters, Gigawatt Megaclusters and the Hyperscaler race to AGI. How AI Datacenters Eat the World.
Recently, OpenAI has shared something. In a blog post, CEO Sam Altman said that the average query uses about 0.34 watt hours of energy.
From the 10-Jun-2025 blog post:
People are often curious about how much energy a ChatGPT query uses; the average query uses about 0.34 watt-hours, about what an oven would use in a little over one second, or a high-efficiency lightbulb would use in a couple of minutes. It also uses about 0.000085 gallons of water; roughly one fifteenth of a teaspoon.
When you open this in two browsers and refresh a few times, one browser after the other, you’ll see the count go up and up (when looking at the page source), proving that the state is shared between both browsers (well, not really, it’s shared on the server, and used by both users). This will have serious consequences if you go this route: if user A is logged in and you’d write the user object to the shared state, and user B is not logged in, they’d still see a flash of user A’s username appear in the navigation bar, until the shared state is overwritten by the undefined user object.
export const state: State = $state({ user: undefined });
The problem is, this creates global (server-wide) state, when it should be "user-local" global state.
But sadly this introduces shared state on the server (when we use SSR), and this is a big problem since we’re now leaking data between different users.
One pattern that I love to use in my SvelteKit projects is returning writable stores from the layout’s load function. This makes it possible to fetch data from the server (for example the user object for the logged in user), and then you make this object available as a writable reactive store throughout the whole application. So when the user updates their username or avatar, you do the PUT request to the server and you get the updated user object back from the server as the response, you can simply update the $user writable store value and every place in your app where you show the user object gets updated immediately.
risk of accidentally exposing one user’s data to another
As with the previous example, this puts one user’s information in a place that is shared by all users.
But what if you want to update this user instance? For example on your website you have a form where the user can change their name, username, or avatar. When the form is submitted this gets stored on the server, but the site still shows the old user information, for example it still shows the old avatar of the user in the top menu. The user variable isn’t writable, so how do you overwrite this?
for - warm data - Nora Bateson - warm data
Stateless vs. Stateful Preprocessing: Most PyTorch transforms are stateless (e.g., RandomHorizontalFlip) or configured with fixed parameters (e.g., Normalize with pre-defined mean/std). If you need to compute statistics from your data (like the mean and standard deviation for normalization), you typically do this once offline and then hardcode these values into the Normalize transform. This contrasts with Keras's Normalization layer, which has an adapt() method to compute these statistics online from a batch of data.
Additional perspective on preprocessing
Preprocessing challenges The following are the primary challenges of implementing data preprocessing: Training-serving skew. Training-serving skew refers to a difference between effectiveness (predictive performance) during training and during serving. This skew can be caused by a discrepancy between how you handle data in the training and the serving pipelines. For example, if your model is trained on a logarithmically transformed feature, but it's presented with the raw feature during serving, the prediction output might not be accurate. If the transformations become part of the model itself, it can be straightforward to handle instance-level transformations, as described earlier in Option C: TensorFlow. In that case, the model serving interface (the serving_fn function) expects raw data, while the model internally transforms this data before computing the output. The transformations are the same as those that were applied on the raw training and prediction data points. Full-pass transformations. You can't implement full-pass transformations such as scaling and normalization transformations in your TensorFlow model. In full-pass transformations, some statistics (for example, max and min values to scale numeric features) must be computed on the training data beforehand, as described in Option B: Dataflow. The values then have to be stored somewhere to be used during model serving for prediction to transform the new raw data points as instance-level transformations, which avoids training-serving skew. You can use the TensorFlow Transform (tf.Transform) library to directly embed the statistics in your TensorFlow model. This approach is explained later in How tf.Transform works. Preparing the data up front for better training efficiency. Implementing instance-level transformations as part of the model can degrade the efficiency of the training process. This degradation occurs because the same transformations are repeatedly applied to the same training data on each epoch. Imagine that you have raw training data with 1,000 features, and you apply a mix of instance-level transformations to generate 10,000 features. If you implement these transformations as part of your model, and if you then feed the model the raw training data, these 10,000 operations are applied N times on each instance, where N is the number of epochs. In addition, if you're using accelerators (GPUs or TPUs), they sit idle while the CPU performs those transformations, which isn't an efficient use of your costly accelerators. Ideally, the training data is transformed before training, using the technique described under Option B: Dataflow, where the 10,000 transformation operations are applied only once on each training instance. The transformed training data is then presented to the model. No further transformations are applied, and the accelerators are busy all of the time. In addition, using Dataflow helps you to preprocess large amounts of data at scale, using a fully managed service. Preparing the training data up front can improve training efficiency. However, implementing the transformation logic outside of the model (the approaches described in Option A: BigQuery or Option B: Dataflow) doesn't resolve the issue of training-serving skew. Unless you store the engineered feature in the feature store to be used for both training and prediction, the transformation logic must be implemented somewhere to be applied on new data points coming for prediction, because the model interface expects transformed data. The TensorFlow Transform (tf.Transform) library can help you to address this issue, as described in the following section.
Challenges with data preprocessing
You preprocess the raw training data using the transformation implemented in the tf.Transform Apache Beam APIs, and run it at scale on Dataflow. The preprocessing occurs in the following phases: Analyze phase: During the analyze phase, the required statistics (like means, variances, and quantiles) for stateful transformations are computed on the training data with full-pass operations. This phase produces a set of transformation artifacts, including the transform_fn graph. The transform_fn graph is a TensorFlow graph that has the transformation logic as instance-level operations. It includes the statistics computed in the analyze phase as constants. Transform phase: During the transform phase, the transform_fn graph is applied to the raw training data, where the computed statistics are used to process the data records (for example, to scale numerical columns) in an instance-level fashion.
Good dichotomy for data preprocessing
Stackable credentials are also critical to the “Some College, No Credential” (SCNC) market, which reached a total of 36.8 million under the age of 65 in the U.S., up 2.9% from the previous year. Recent research from UPCEA and StraighterLine found that 76% of SCNC adults said being able to earn alternative or microcredentials that could stack toward a degree would increase or greatly increase their interest in completing their degree
In other words: 36.8M people have some college, and 76% say the ability to earn formal credentials that stack to degrees would increase their interest in completing their degree. That's 28 MILLION adults who already did post-secondary once and could be re-engaged. The dreaded enrollment cliff is 3M and yet 10x that number of people who already self-selected into college once gets none of the same attention. It's a massive opportunity.
Data Frame
Un DataFrame es una tabla de datos estructurada en filas y columnas que permite organizar, manipular y analizar información de forma sencilla y ordenada en programación y ciencia de datos.

No me está funcionando la tabla, creo que me falta definir alguna variable o estoy olvidando algún paso.
fighter1 data
Aquí en esta orden al ejecutarla en Glamorous también se produce un error de respuesta nul
a
nebraska case study of data sharing for court-involved youth
Nur 36 Firmen waren 2023 für über die Hälfte der weltweit ausgestoßenen Treibhausgase verantwortlich. Das ergibt eine Analyse der Daten in der Carbon Majors Database. Die meisten der 169 in dieser Datenbank erhaltenen Firmen haben im Jahr 2023, dem damals heißesten Jahr der Weltgeschichte, ihre Emissionen gesteigert.
Zu den Hauptverschmutzern gehört auch die #Adnoc, deren Anteile an der österreichischen #OMV mit denen des österreichischen Staates syndiziert sind.
Frühere Versionen des von InfluenceMap produzierten Carbon Majors-Bericht spielten bei Prozessen gegen fossile Unternehmen eine wichtige Rolle. https://www.theguardian.com/environment/2025/mar/05/half-of-worlds-co2-emissions-come-from-36-fossil-fuel-firms-study-shows
Carbon Majors 2023 Data Update: https://carbonmajors.org/briefing/The-Carbon-Majors-Database-2023-Update-31397
open sourcing all of this as part of TensorFlow so that anyone can use these tools to explore their data.
for - tensorflow - data visualization of words - question - tensorflow - for SRG tool?
for - data visualization - words in high dimensional space - Google tensorflow - open source data visualization - of words
words are treated as high-dimensional data points.
for - words - high dimensionality data points
for - chalmers university - digital twin cities centre - from - youtube - urban data visualization using mixed reality - https://hyp.is/ptvO5BexEfC063-4BZXD-A/www.youtube.com/watch?v=tN2_TJ1ZYhQ
for - mixed reality 3d graph data visualization - skyrails - gelphi
Wybrane dane z raportu:Grupa wiekowa 7-12 lat:Z serwisów społecznościowych i komunikatorów dozwolonych od 13. roku życia aktywnie korzysta znacznie ponad połowa tej grupy wiekowej – aż 1,4 mln dzieci (58%). co trzecie dziecko (760 tys.) (32%) ma regularny dostęp do platformy TikTok, 24% (580 tys.) do Facebooka, zaś 12% (290 tys.) – do Instagrama.Dzieci powszechnie używają komunikatorów: 38% Messengera (900 tys.), a 31% Whatsappa (720 tys.).Najintensywniej korzystają z TikToka – aktywni użytkownicy tej platformy spędzają w aplikacji średnio 2 godziny i 11 minut dziennie i w większości przypadków uruchamiają ją kilkanaście lub kilkadziesiąt razy w ciągu jednego dnia. Można szacować, że ponad dwie godziny dziennie na tej platformie spędza ponad 300 tys. dzieci.Grupa wiekowa 7-14 lat:85% z nich korzysta z internetu (2,7 mln).Spośród nich 96 proc. (2,6 mln) łączy się z siecią poprzez urządzenia mobilne.Najczęściej korzystają z platform społecznościowych i streamingowych. W serwisach społecznościowych spędzają ponad 2 godziny dziennie, zaś na platformach streamingowych blisko 2 godziny. Najczęściej wybieranymi kategoriami tematycznymi są: kultura i rozrywka, edukacja oraz erotyka. Z rozrywki – głównie gier oraz muzyki – korzysta 95 proc internautów z tej grupy, podobny odsetek odwiedziło treści edukacyjne, zaś erotyczne – 51 proc. Do korzystania z serwisów erotycznych najczęściej wykorzystują urządzenia mobilne.
Manyvillagers believed that abandoning these rituals would anger their ancestors and cause harm totheir families (WHO, 2015)
This is an incorrect information since under the "Factors that contributed to undetected spread", it did not state this information given. Additionally, the given information was unrelated from the reference/citation given.
Parallel sets Parallel coordinate plots provide a way to display multidimensional data in 2D plots. They do this by representing the variables as a set of parallel axes, and showing each observation as a line in parallel coordinate space, rather than as a point in standard coordinate space. Extensions of this idea for categorical data led to “parallel sets plots”, and some variations, a number of which use the Titanic data for examples. Bendix, Kosara, and Hauser (2005) Parallel sets: Visual analysis of categorical data and Kosara:2006-parallel Parallel sets: Interactive exploration and visual analysis of categorical data developed an interactive system to explore multivariate categorical data using parallel sets, in which the lines between categories of successive variables are of width proportional to the joint frequencies.
Due to the lack of visual clarity, I struggled to understand what 2005 parallels sets were actually representing in this context (especially when external searching seems to tell me that these types of plots are usually formatted horizontally), to the point of forgetting how most of these charts are tracking how of a certain grouping lived/died from the sinking, which makes me question on what benefits we get from them. I do appreciate the 2013 charts not only for an accurate line widths, but being clear enough with the color and shade distinctions in certain lines to make clear what feeds into what (although I do wish the "Survived" category was either on top or bottom rather than the middle).
Por ejemplo, según comentaron, la inversión en desarrollo de interfaces gráficas de usuarios se suele posponer por los altos costos que implica el diseño y la puesta a prueba de ellas. Algo similar sucede con las traducciones y localizaciones, pues requieren de personas con conocimiento situado. Adicionalmente, muchos proyectos paran sus actividades una vez el primer empujón institucional y financiero cesa, y por lo tanto sus características quedan congeladas en el tiempo o caducan por falta de soporte.
Es interesante como Grafoscopio ha evitado varias de estas fallas al hacer elecciones extrañas como ser desarrollado en Pharo (que de entrada le da interfaz gráfica y modelos de persistencia de datos ad-hoc), organizando talleres informales como las Data Week y las Data Rodas que crean conocimiento localizado y hacen una diglosia puente en lugar de abismo y basarse en las economías del cuidado y los afectos, reconociéndolas para no requerir tanto dinero inicial. Si bien se comparten las fragilidades de los proyectos de pequeña y mediana, por ejemplo respecto a el número pequeño de desarrolladores, vale la pena visibilizar también estas estrategias diferenciadas para lidiar con estos problemas comunes.
No obstante, también se ha convertido en un espacio frustrante por el fenómeno del freeriding, pues las personas que participan aprovechan momentáneamente los espacios y conocimientos del club, pero no sienten compromisos mínimos con él, como un respeto por el tiempo de quiénes lo organizan o la necesidad de informar sobre su eventual falta de participación. Quienes han participado a largo aliento en el club, sin embargo, han encontrado que su aprendizaje en comunidad es mucho más potente que el ejercicio autodidacta en solitario
Algo similar experimentamos con las Data Weeks y Data Rodas en Grafoscopio, lo que nos llevó a establecer una serie de principios que incluían cosas como las prácticas de cuidado mútuo y reconocemos el carácter flotante de la mayoría de læs participantes y el duradero de muy pocos (por ello y otras cosas es clave la creación permanente de memoria viva hipertextual en nuestras infraestructuras de bolsillo).
salir de la lógica del taller: nosotros vivimos haciendo talleres y talleres y talleres, y pasar a una lógica mucho más concreta de generar desarrollos y soluciones que permitan ser sostenibles. La institucionalización permite la sostenibilidad"48«Entrevista a Jairo Melo»..
Otra posibilidad es la construcción de memoria viva durante y entre los talleres que les de un sentido de continuidad y progreso y que permita valorar la lógica de los talleres para construir tecnologías propias en lugar de pensarla sólo para la apropiación de tecnologías externas, muy en resonancia con lo dicho en este comentario.
Si bien tenemos aún problemas, en Grafoscopio, para el aprendizaje entre pares gradual, la memoria viva y los problemas altamente contextuales hacen de ellos problemas encarnados que asumimos en talleres futuros y vínculos entre comunidades de práctica y espacios institucionalizados
El interactivo 5, a continuación, presenta un generador de textos que entremezcla las palabras de distintos autores que, cada uno a su manera, han reflexionado sobre el humanismo en América Latina: Manuel Quintín Lame, Domingo Sarmiento, Leopoldo Zea, Oswald de Andrade y José Vasconcelos. Usando un sistema de cadenas de Markov, comúnmente aplicado en obras de literatura electrónica, este generador remezcla distintos textos y crea una amalgama que brinca entre los términos usados en ellos:
De nuevo, un sesgo de género, en el material publicado/seleccionado.
Interesante experimento de humanidades digitales en pequeño.
situar y entender los textos en el mundo humano construido en estos procesos básicos nos devuelve a la disciplina humanística de la hermenéutica, de la que las humanidades digitales son una encarnación tecnológica"78
Y sin embargo, este bonito propósito, es ortogonal al tamaño de las bases de datos. Otras hermenéuticas computacionales podrían ocurrir y de hecho ocurren, en lo pequeño.
Die Trumpadministration entfernt systematisch Hinweise auf die Klimakrise und die globale Erhitzung von amerikanischen Regierungswebseits. Der Klimaforscher Michael Mann sagt, dass man mit dem schlimmsten Rechnen müsse, weil die Verschwurzer jetzt an die Macht gekommen sein. Fachleute gehen davon aus, kommen, dass die neue Regierung systematisch versuchen wird, kommen Informationen über die Ursachen und die Folgen der Klimakrise zu unterdrücken. Gleichzeitig werden Regierungsmaßnahmen zur Klimaanpassung und zur Reduzierung von Treibhausgas Emissionen blockiert. https://www.theguardian.com/us-news/2025/feb/04/trump-climate-change-federal-websites
Im Standard stellt Martin Auber mit aktuellen Daten belegt dar, warum der bloße Ausbau der Kapazitäten zur Erzeugung erneuerbarer Energien nicht zu einer Dekabonisierung führen wird. Der Energiebedarf wächst wesentlich schneller als die zur Verfügung stehende erneuerbare Energiepunkt. Durch den KI-Boom wird er noch einmal deutlich gesteigert. https://www.derstandard.at/story/3000000255154/wann-kommt-die-energiewende-oder-kommt-sie-gar-nicht
À mesure que nos vies sont de plus en plus connectées, les données personnelles que nous émettons lors de chacune de nos activités deviennent un enjeu industriel considérable.
Partons à la découverte d’un monde bâti autour du big data.
Moldable Development involves two distinct roles, each with its own set of skills. The facilitator (in blue on the map) is a technical role that is concerned with the technical part of building tools. But that alone is not enough. The stakeholder (in red) is at least as important. Tools are only meaningful when the relate to a question or hypothesis that is tied to value. That's the job of the stakeholder.
En la comunidad de Grafoscopio, también hay roles con mayor experticia técnica que otros, a pesar de que todos estamos involucrados en la solución del problema. En nuestro caso, no hay una toma de decisiones, retos y problemas específicos que son decididos por el público no técnico (las que están en rojo en el mapa de Wardley), sino y otras acciones técnicas (en azul) sólo referidas a los desarrolladores, sino que hay un difuminado de color (quizás morado) que se torna más azul entre más técnica es la acción, a pesar de que participan de ella también personas no técnicas, y que se hace más roja entre más administrativa, a pesar de que en la administración participan también los desarrolladores. El lugar de encuentro de estas dos experiencia y la mezcla de colores ocurre particularmente en los talleres como las Data Weeks y las Data Rodas
the most critical issues to harness innovation within the AI ecosystem
La diversidad corporal en Colombia abarca una amplia gama de experiencias, marcadas por la riqueza multicultural y la interacción de comunidades indígenas, afrodescendientes, campesinas y urbanas. Esta diversidad también está entrelazada con el acceso desigual a la tecnología, la salud y la educación, especialmente en áreas rurales.
El uso de la Inteligencia Artificial para abordar problemas sociales, como se ha hecho en África, puede inspirar iniciativas en Colombia. Por ejemplo:
La Inteligencia Artificial para diagnósticos tempranos de enfermedades como el cáncer de mama o la tuberculosis, adaptados a los contextos rurales colombianos, donde los servicios médicos son limitados.
Modelos de Inteligencia Artificial para identificar plagas y enfermedades en cultivos de importancia para las comunidades rurales, como el café, el plátano o el maíz.
Considerar las diversidades corporales al diseñar soluciones que sean accesibles para todas las personas, independientemente de sus capacidades físicas o contexto social.
La traducción en Colombia puede desempeñar un papel fundamental en la creación y el uso de datos localizados para entrenar a la Inteligencia Artificial. Similar a la inclusión de Luganda en el proyecto Common Voice en África, se pueden desarrollar iniciativas para recopilar y traducir datos en lenguas indígenas colombianas, como el wayuunaiki, nasa yuwe o emberá.
Ampliar la representación de las lenguas indígenas en aplicaciones de la Inteligencia Artificial, como asistentes virtuales o sistemas de reconocimiento de voz.
Ayudar a preservar y revitalizar estas lenguas al integrarlas en tecnologías modernas.
Generar datasets lingüísticos diversos que fomenten el desarrollo de Inteligencia Artificial inclusivas, contextualizadas y éticamente responsables.
La Inteligencia Artificial para el bien social descrito en África puede adaptarse al contexto colombiano, aprovechando la “tubería de datos a impacto” para resolver problemas reales.
La identificación de problemas debe ser participativa, integrando a las comunidades afectadas.
Soluciones para mejorar la logística de distribución de alimentos en regiones apartadas.
Inteligencia Artificial para identificar y mitigar riesgos ambientales en zonas afectadas por la minería ilegal o la deforestación.
Es crucial desarrollar datasets localizados y representativos para evitar sesgos en los modelos de Inteligencia Artificial.
Bases de datos agrícolas que reflejen las particularidades de los ecosistemas colombianos.
Datos de salud adaptados a las diversidades genéticas y culturales del país.
El diseño de IA debe basarse en el entendimiento del contexto local y cultural.
Adaptar modelos a las necesidades específicas de comunidades indígenas y afrodescendientes.
Integrar saberes tradicionales en soluciones tecnológicas, reconociendo el conocimiento colectivo y las prácticas ancestrales.
La educación en ética de la Inteligencia Artificial es esencial para formar profesionales conscientes de los impactos sociales y culturales de sus creaciones. Además, deben establecerse directrices claras para implementar principios éticos en el desarrollo de tecnologías, fomentando prácticas inclusivas y no extractivas.
We are essentially digitizing trees, animals, and plants and rivers, and boundaries, defining those using satellite imagery.
En Colombia, las corporalidades están profundamente vinculadas a la identidad cultural, territorial y espiritual. Para muchas comunidades indígenas, afrodescendientes y campesinas, el cuerpo no solo es físico, sino también un puente con la tierra y la naturaleza.
Estas comunidades entienden el territorio como un elemento vital de su existencia colectiva, lo que contrasta con las visiones occidentales que separan al individuo del entorno natural.
La digitalización de territorios, como se plantea en el uso de la Inteligencia Artificial para conservación, presenta desafíos éticos importantes. Clasificar y definir tierras y recursos naturales a través de imágenes satelitales y algoritmos puede despojar a estas comunidades de su conexión simbólica y material con el territorio, perpetuando desigualdades históricas y vulnerando sus derechos culturales y corporales.
La traducción en Colombia podría desempeñar un papel clave al mediar entre las perspectivas indígenas y las prácticas occidentales de conservación y digitalización de territorios.
Traducir no solo lenguas, sino también conceptos culturales como la relacionalidad con la naturaleza y el conocimiento colectivo, es esencial para evitar malentendidos y garantizar que las voces de las comunidades sean escuchadas.
Por ejemplo, cuando se desarrollan proyectos de conservación basados en la Inteligencia Artificial, la traducción puede ayudar a garantizar que los principios, usos y riesgos de estas tecnologías sean entendidos desde las cosmovisiones indígenas, en lugar de imponer terminologías y enfoques que no respeten sus prácticas y saberes.
La implementación de Inteligencia Artificial en conservación y digitalización de tierras en Colombia debería centrarse en que:
Las comunidades indígenas deban ser incluidas como actores principales en el diseño de tecnologías que afectan sus territorios. Esto requiere procesos de consulta previos, libres e informados, en línea con los estándares internacionales de derechos humanos.
En lugar de imponer un modelo de digitalización basado en la separación tierra-persona, la Inteligencia Artificial deba reflejar cómo estas comunidades perciben su conexión espiritual, cultural y económica con la naturaleza.
La Inteligencia Artificial deba reconocer y respetar el conocimiento colectivo de las comunidades. Esto incluye evitar la apropiación de datos que no consideren el carácter comunal de la identidad y el saber indígena, promoviendo en su lugar principios éticos como los planteados en la posición de Indigenous AI.
Tables of Possible Cohorts - MS DX Only with and without washout
Look at who is and is not switching.
“The revenue model behind these open platforms is to be found in the user data and the value that data can represent.”
a more completelearner profile
more complete: for who? by what means? what implications?
“Collect it all” is a phrase used to encapsulate the mission of General Keith Alexander, director of the US National Security Agency
cf matters of disclosure, consent, and differing orientation to/with privacy: MIT Tech Review article on CMU Mites in TCS Hall
Slop – in the sense of the flood of information and the calibration of how information is filtered – is power.
cf Rob Kitchin on big data: capture everything anyway
object recognition
and cf romanticized (or oft-told) narratives of this in CAPTCHAs
for - progress trap - Tesla autopilot - YouTube - The hidden data that reveals why Tesla's crash - WSJ - 2024 - Dec
Zaskakujące odkrycie naukowców: Jak szybki chód działa na zdrowie metaboliczne?
Participants were asked: "Is your walking speed faster than people of your gender and age?" Based on their answers, they were categorized as "fast walkers" or "slow walkers."
The study included:
Summary:
supposing I was a writer, say, for a newspaper or for a magazine. I could create content in one language, FreeSpeech, and the person who's consuming that content, the person who's reading that particular information could choose any engine, and they could read it in their own mother tongue, in their native language
for - freespeech can be used as an international language translator - data structure of thought - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan
when you want to use Google, you go into Google search, and you type in English, and it matches the English with the English. What if we could do this in FreeSpeech instead? I have a suspicion that if we did this, we'd find that algorithms like searching, like retrieval, all of these things, are much simpler and also more effective, because they don't process the data structure of speech. Instead they're processing the data structure of thought
for - indyweb dev - question - alternative to AI Large Language Models? - Is indyweb functionality the same as Freespeech functionality? - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan - data structure of thought - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan
he earliest we've been able to get to a case of tukdam is 26 hours after a practitioner has died so we've missed the first full day and there is some reason to believe that that first 24-hour period is is going to be very very important
for - trivia - measuring tukdam after death - 24 hour period immediately following death is important but to date, no data captured - Youtube - Tukdam talk - An Overview Of CHM’s Work On “Well-Being And Tukdam” - Prof. Richard J. Davidson
we have all of these huge applications that are gathering all this data uh and it's out there and theoretically is our data sort of but in reality they control it and you can't actually link the data to each other you only link to accessing the data through their application
for - quote - silos - internet limitations - location addressed server architecture limitations - silos - cannot link data from each silo - Juan Benet - IPFS
TRSP Desirable Characteristics Indigenous Peoples have the right to develop cultural governance protocols for Indigenous data and be active leaders in the stewardship of, and access to, Indigenous data especially in the context of Indigenous Knowledge
TRSP Desirable Characteristics Indigenous Peoples have the right to data that are relevant to their world views and empower self-determination and effective self-governance. Indigenous data must be made available and accessible to Indigenous nations and communities in order to support Indigenous governance.
you can feel that as you're walking around you can feel that data on your wrist
for - sensory substitution - like a new interoception - new exterocepation - feel the data