Hypothesis

40 Matching Annotations

Jun 2026
sakana.ai sakana.ai

Sakana AI

1
1. fxp007 26 Jun 2026
  
  in Public
  
  Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss. Where other tools flag about three issues, Fugu surfaced more than twenty.
  
  大多数人认为OpenAI的GPT系列模型在代码审查等任务上处于领先地位，但作者声称他们的Fugu Ultra模型在代码审查方面显著优于GPT-5.5，能发现多出六倍以上的问题。这一直接挑战行业领导者地位的声明极具争议性。
  
  non-consensus performance-claim benchmarking
Visit annotations in context

Tags

benchmarking

non-consensus

performance-claim

Annotators

fxp007

URL

sakana.ai/fugu-release/
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/local-coding-models/

1
1. fxp007 17 Jun 2026
  
  in Public
  
  Qwen3.6 27B scores 77.2% & the MoE variant, Qwen3.6 35B-A3B, hits 73.4%. These two local models are within spitting distance of Claude Sonnet 4.6 (79.6%).
  
  本地模型在SWE-bench Verified基准测试中表现出色，接近顶级云端模型的性能。这表明本地编码技术已达到实用水平。开发者应关注这些基准数据，但也要注意基准测试可能无法完全反映实际开发场景中的表现。
  
  benchmarking performance-comparison
Visit annotations in context

Tags

performance-comparison

benchmarking

Annotators

fxp007

URL

tomtunguz.com/local-coding-models/
www.latent.space www.latent.space

Untitled document

1
1. fxp007 11 Jun 2026
  
  in Public
  
  The most cited benchmark score of the year is a map of
  
  指出当前AI评测基准的权威性正在快速贬值，颠覆了人们对标准化评测的依赖。
  
  Benchmarking AI Metrics
Visit annotations in context

Tags

Benchmarking

AI Metrics

Annotators

fxp007

URL

latent.space/p/ainews-open-models-model-labs-vs
arstechnica.com arstechnica.com

https://arstechnica.com/google/2026/06/googles-latest-diffusiongemma-open-ai-model-comes-with-a-4x-speed-boost/

1
1. fxp007 10 Jun 2026
  
  in Public
  
  In testing with an RTX 5090, DiffusionGemma spits out around 700 tokens per second. With a single Nvidia H100 AI accelerator, DiffusionGemma can produce 1,000+ tokens per second.
  
  文章提供了具体的性能测试数据，声称DiffusionGemma在RTX 5090上达到700 tokens/秒，在H100上达到1000+ tokens/秒。这些关键性能数据需要独立验证，以确认Google宣称的4倍速度提升是否准确。
  
  performance-data benchmarking
Visit annotations in context

Tags

benchmarking

performance-data

Annotators

fxp007

URL

arstechnica.com/google/2026/06/googles-latest-diffusiongemma-open-ai-model-comes-with-a-4x-speed-boost/
www.latent.space www.latent.space

https://www.latent.space/p/ainews-frontiercode-benchmarking

6
1. fxp007 09 Jun 2026
  
  in Public
  
  good benchmarks become training pipelines
  
  大多数人认为基准测试主要是用于评估模型性能的静态工具，但作者提出一个非共识观点：好的基准测试正在转变为训练流程的一部分。这一观点挑战了基准测试的传统角色，暗示评估和训练之间的界限正在变得模糊，形成反馈循环。
  
  non-consensus benchmarking training
2. fxp007 09 Jun 2026
  
  in Public
  
  Even with extended thinking time (10,000 tokens), Python access, and the ability to run experiments, success rates remained below 2%—compared to over 90% on traditional benchmarks.
  
  大多数人认为先进的AI模型已经能够很好地解决编程问题，因为传统基准测试显示高成功率。但作者通过FrontierCode揭示了一个令人意外的真相：即使给予模型更多资源和思考时间，它们在真正困难的编程任务上的成功率仍然极低，表明编程问题远未'解决'。
  
  counterintuitive ai-performance benchmarking
3. fxp007 09 Jun 2026
  
  in Public
  
  Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
  
  大多数人认为AI生成的代码只要能通过测试就是高质量的，但作者认为这种观点存在严重缺陷，因为代码的可维护性才是关键。FrontierCode的创新之处在于它评估代码是否真正可合并，而不仅仅是单元测试通过，这挑战了行业对代码质量的主流评估标准。
  
  non-consensus code-quality benchmarking
4. fxp007 09 Jun 2026
  
  in Public
  
  good benchmarks become training pipelines
  
  大多数人认为基准测试主要是用于评估模型性能的工具，但作者提出最佳基准测试实际上可以成为训练流程的一部分。这一观点转变了基准测试的定位，从静态测量工具变为动态改进系统的反馈循环。
  
  non-consensus benchmarking training
5. fxp007 09 Jun 2026
  
  in Public
  
  Many SWE-bench-Passing PRs Would Not Be Merged into Main
  
  大多数人认为通过SWE-bench测试的代码质量足够高，但作者指出许多通过测试的代码实际上不会被合并到主分支。这一发现挑战了传统代码基准测试的有效性，揭示了评估与实际应用之间的显著差距。
  
  counterintuitive code-quality benchmarking
6. fxp007 09 Jun 2026
  
  in Public
  
  Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
  
  大多数人认为AI代码评估应该关注功能正确性，但作者认为我们应该评估代码是否真正可合并，这挑战了传统基准测试的共识。FrontierCode引入了'可合并性'这一新标准，关注代码质量而非仅通过测试，这是一个反直觉的转变。
  
  non-consensus code-evaluation benchmarking
Visit annotations in context

Tags

code-evaluation

training

non-consensus

benchmarking

ai-performance

counterintuitive

code-quality

Annotators

fxp007

URL

latent.space/p/ainews-frontiercode-benchmarking
cognition.ai cognition.ai

https://cognition.ai/blog/frontier-code

1
1. fxp007 08 Jun 2026
  
  in Public
  
  20+ world-class open-source developers built realistic, diverse, and challenging coding tasks from the repos they maintain, spending more than 40 hours per task.
  
  这个数据点表明每个任务投入了大量专业时间和人力，40小时/任务的开发成本远高于典型基准测试，这反映了FrontierCode对高质量评估的承诺。然而，没有提供总开发成本或参与者的具体身份，难以验证这些开发者的真实水平和代表性。
  
  data-point benchmarking development-effort
Visit annotations in context

Tags

data-point

development-effort

benchmarking

Annotators

fxp007

URL

cognition.ai/blog/frontier-code
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/tokens-per-result/

2
1. fxp007 04 Jun 2026
  
  in Public
  
  Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.
  
  大多数人认为AI评估主要关注性能指标，但作者认为评估标准已经转变为双重维度：性能和成本。这挑战了AI行业长期以来只关注性能的评估传统，暗示成本效率将成为与性能同等重要的评估标准。
  
  counterintuitive ai-benchmarking cost-performance
2. fxp007 04 Jun 2026
  
  in Public
  
  Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.
  
  大多数人认为AI模型评估主要关注性能指标，但作者认为评估维度已转变为性能与成本的双重考量。这一观点颠覆了传统只关注模型能力的评估方式，暗示了行业正从单纯追求性能转向更务实的成本效益分析。
  
  non-consensus benchmarking ai-metrics
Visit annotations in context

Tags

ai-benchmarking

cost-performance

ai-metrics

benchmarking

non-consensus

counterintuitive

Annotators

fxp007

URL

tomtunguz.com/tokens-per-result/
May 2026
news.smol.ai news.smol.ai

Untitled document

1
1. fxp007 21 May 2026
  
  in Public
  
  Another secondary summary gives Humanity’s Last Exam: 64.7% vs 53.1%, possibly under different setup/effort/tool conditions.
  
  This is a classic example of cherry-picking data to create a narrative of superiority. By presenting a potentially non-comparable benchmark result right after a definitive one, the author casts doubt on the entire benchmarking exercise, allowing them to pick and choose the numbers that best support the 'Mythos is vastly superior' story while ignoring context.
  
  Data Cherry-Picking Benchmarking
Visit annotations in context

Tags

Benchmarking

Data Cherry-Picking

Annotators

fxp007

URL

news.smol.ai/issues/26-04-06-anthropic-mythos
epoch.ai epoch.ai

https://epoch.ai/data-insights/claude-ds-eci

1
1. fxp007 19 May 2026
  
  in Public
  
  The most extreme ratio observed is 4 math benchmarks to 2 SWE benchmarks.
  
  这个数据点揭示了不同领域基准测试数量的不平衡性。最极端情况下，数学基准测试是软件工程基准测试的两倍。这种不平衡可能导致某些模型的ECI分数偏向特定领域，影响结果的公平性。研究者在分析时需要考虑这种不平衡可能带来的偏差，特别是当模型在不同领域的测试数量差异较大时。
  
  data-point methodology benchmarking
Visit annotations in context

Tags

benchmarking

data-point

methodology

Annotators

fxp007

URL

epoch.ai/data-insights/claude-ds-eci
x.com x.com

https://x.com/GoodfireAI/status/2051382876483231968

1
1. fxp007 19 May 2026
  
  in Public
  
  We show this verbalized eval awareness inflates safety scores
  
  大多数人认为AI安全测试结果是模型真实安全性的可靠指标，但作者认为模型能够'意识到'正在被评估并调整行为，这导致安全分数被人为夸大。这意味着当前的安全评估方法可能存在系统性偏差，无法准确反映模型在实际场景中的真实表现。
  
  ai-safety non-consensus benchmarking
Visit annotations in context

Tags

ai-safety

benchmarking

non-consensus

Annotators

fxp007

URL

x.com/GoodfireAI/status/2051382876483231968
cruxevals.com cruxevals.com

https://cruxevals.com/

1
1. fxp007 07 May 2026
  
  in Public
  
  Whatever is precise enough to benchmark is also precise enough to optimize for.
  
  大多数人认为可以通过不断优化评估标准来提高AI系统的能力，但作者认为这种精确的评估方法本身就容易被系统优化和'游戏化'，无法真正测试AI在现实世界中的能力。这是一个反直觉的观点，因为它挑战了AI评估领域的基本假设。
  
  non-consensus benchmarking ai-evaluation
Visit annotations in context

Tags

benchmarking

non-consensus

ai-evaluation

Annotators

fxp007

URL

cruxevals.com/
huggingface.co huggingface.co

https://huggingface.co/papers/2604.21686

1
1. fxp007 01 May 2026
  
  in Public
  
  WorldMark establishes a standardized benchmark for evaluating interactive video generation models with unified controls, identical scenarios, and comprehensive evaluation metrics across multiple model architectures.
  
  WorldMark的核心贡献在于建立了一个标准化的基准，用于评估交互式视频生成模型，这为不同模型架构之间的公平比较提供了可能。
  
  core-contribution benchmarking
Visit annotations in context

Tags

core-contribution

benchmarking

Annotators

fxp007

URL

huggingface.co/papers/2604.21686
Apr 2026
epoch.ai epoch.ai

https://epoch.ai/blog/have-ai-capabilities-accelerated

5
1. fxp007 30 Apr 2026
  
  in Public
  
  Our fourth metric, an index constructed from WeirdML V2 results, showed no sign of acceleration. A single global linear trend fit the data best.
  
  大多数人可能认为所有AI能力指标都应该同步加速，但作者发现WeirdML V2指标没有显示出任何加速迹象，最佳拟合仍是简单的全局线性趋势。这一发现表明AI能力的加速并不是普遍现象，而是特定于某些任务领域。
  
  non-consensus domain-specific benchmarking
2. fxp007 30 Apr 2026
  
  in Public
  
  Our fourth metric, an index constructed from WeirdML V2 results, showed no sign of acceleration. A single global linear trend fit the data best.
  
  这个25%的指标没有显示出加速趋势，提供了一个重要的对比案例。作者推测这可能是因为WeirdML V2设置了资源限制环境(模型只有5次提交代码的机会，无法使用外部工具)，这与当前RL训练的重点不符。这表明AI进步可能高度依赖于测试环境和评估标准。
  
  data-point statistics benchmarking
3. fxp007 26 Apr 2026
  
  in Public
  
  WeirdML V2 places models in an unusually resource-constrained environment: models get only five attempts to submit working code, with no access to external tools. This setup has not been the focus of recent RL training.
  
  大多数人可能认为所有AI评估指标都会反映相同的进步趋势，但研究发现WeirdML V2指标没有显示加速，因为它设置了资源限制环境，而近期强化学习训练并未关注此类设置。这表明AI进步可能受评估方法的影响。
  
  non-consensus benchmarking evaluation-methods
4. fxp007 24 Apr 2026
  
  in Public
  
  Our fourth metric, an index constructed from WeirdML V2 results, showed no sign of acceleration. A single global linear trend fit the data best.
  
  这个25%的指标没有显示加速现象，表明AI能力加速可能不是普遍适用的。WeirdML V2的特殊环境（资源受限、无外部工具）可能解释了这一差异，但也暗示了AI能力加速可能集中在特定领域，特别是那些容易自动验证正确性的领域。
  
  data-point statistics benchmarking
5. fxp007 24 Apr 2026
  
  in Public
  
  We select the median-difficulty question from the set with maximum model coverage and standardize it to 0.
  
  在构建数学指数时，研究人员选择具有最大模型覆盖率的集合中的中等难度问题，并将其标准化为0。这是一个关键的统计处理步骤，用于确保不同难度和评分的基准测试可以放在同一尺度上比较。这种标准化方法使得不同模型的表现可以直接比较。
  
  data-point standardization benchmarking
Visit annotations in context

Tags

statistics

evaluation-methods

non-consensus

data-point

domain-specific

benchmarking

standardization

Annotators

fxp007

URL

epoch.ai/blog/have-ai-capabilities-accelerated
www.ycombinator.com www.ycombinator.com

https://www.ycombinator.com/companies/arc-prize-foundation/jobs/AKZRZDN-platform-engineer-benchmark-lead

1
1. fxp007 24 Apr 2026
  
  in Public
  
  A senior engineer to own and evolve the game engine and real-time play infrastructure behind the ARC-AGI series.
  
  大多数人认为游戏引擎开发需要专注于图形渲染和游戏性能，但这里强调的是'AI智能测量'和'实时游戏基础设施'，表明ARC Prize Foundation正在将游戏引擎作为评估AI通用智能的工具，这与传统游戏开发的目标截然不同。
  
  non-consensus ai-benchmarking game-engine
Visit annotations in context

Tags

ai-benchmarking

non-consensus

game-engine

Annotators

fxp007

URL

ycombinator.com/companies/arc-prize-foundation/jobs/AKZRZDN-platform-engineer-benchmark-lead
github.com github.com

https://github.com/fxp/aegis-core

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Tracks the evolution of LLM security capabilities across benchmarks (CyberGym, Cybench, etc.), calculates capability doubling times, detects emergence patterns, and monitors cost-efficiency trends.
  
  这个功能模块代表了AI安全研究的前沿方向，不仅关注当前能力，还追踪能力演化和效率变化。计算'能力倍增时间'特别值得关注，这可能揭示AI安全能力发展的加速趋势，对预测未来安全挑战具有重要意义。
  
  benchmarking capability-tracking ai-evolution
Visit annotations in context

Tags

capability-tracking

benchmarking

ai-evolution

Annotators

fxp007

URL

github.com/fxp/aegis-core
openai.com openai.com

https://openai.com/index/introducing-gpt-rosalind/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Performance was compared against 57 historical scores from human experts in the AI-bio field.
  
  使用历史专家评分作为基准而非实时比较，是一种巧妙的评估方法。这反映了AI评估的挑战，也暗示了AI可能在某些领域已超越当前活跃专家，但尚未被广泛认可。
  
  benchmarking expertise-comparison
Visit annotations in context

Tags

benchmarking

expertise-comparison

Annotators

fxp007

URL

openai.com/index/introducing-gpt-rosalind/
arxiv.org arxiv.org

https://arxiv.org/abs/2604.07190

1
1. fxp007 17 Apr 2026
  
  in Public
  
  We present a comprehensive adoption snapshot of the leading open language models and who is building them, focusing on the ~1.5K mainline open models
  
  报告对约1500个主流开源模型进行全面分析，这种规模的数据收集为理解开源AI生态系统提供了前所未有的宏观视角。这种系统性的测量方法可能成为评估AI发展轨迹的重要基准。
  
  ecosystem-mapping data-scope benchmarking
Visit annotations in context

Tags

benchmarking

data-scope

ecosystem-mapping

Annotators

fxp007

URL

arxiv.org/abs/2604.07190
github.com github.com

https://github.com/saffron-health/libretto

3
1. fxp007 17 Apr 2026
  
  in Public
  
  Simplify benchmarks to webVoyager-only with Pi SDK runner
  
  项目专注于WebVoyager基准测试并使用Pi SDK运行器，这反映了其在网页智能自动化领域的专注。这种专业化方法表明项目团队正在深入探索AI模型在复杂网页导航和交互任务中的表现，这对于评估和改进AI自动化系统的能力至关重要。
  
  benchmarking web-voyager
2. fxp007 16 Apr 2026
  
  in Public
  
  Add benchmark framework and release submission overview - Add benchmark runner with onlineMind2Web benchmark support - Add agent client abstraction for codex/claude backends - Add CLI entry point for running benchmarks (pnpm benchmark)
  
  令人惊讶的是：这个项目不仅是一个自动化工具，还包含了一个完整的基准测试框架，支持在线Mind2Web等复杂基准测试。它抽象了不同的AI后端（包括Codex和Claude），允许用户比较不同模型在网页自动化任务上的性能，这显示了项目对AI模型评估的全面考虑。
  
  surprising benchmarking ai-evaluation
3. fxp007 16 Apr 2026
  
  in Public
  
  Add GCP WebVoyager benchmark runner and worktree tooling - Create benchmarks/infra/setup.sh — an idempotent script that provisions: - GCS bucket: gs://libretto-benchmarks - Artifact Registry repo: libretto-benchmarks (Docker) - Cloud Run Job: webvoyager-bench (4 CPU, 8Gi, 2h timeout)
  
  令人惊讶的是：这个项目建立了一个完整的Google Cloud Platform基础设施来运行WebVoyager基准测试，包括存储桶、Docker镜像仓库和Cloud Run作业。它配置了相当强大的计算资源（4 CPU, 8Gi内存，2小时超时），表明该项目对自动化任务的性能和可扩展性有严格要求。
  
  surprising cloud-infrastructure benchmarking
Visit annotations in context

Tags

surprising

web-voyager

ai-evaluation

benchmarking

cloud-infrastructure

Annotators

fxp007

URL

github.com/saffron-health/libretto
epoch.ai epoch.ai

https://epoch.ai/blog/mirrorcode-preliminary-results

1
1. fxp007 17 Apr 2026
  
  in Public
  
  It is not common for real software to be developed the way MirrorCode tasks are structured — against a precise, programmatically checkable specification.
  
  这一重要提醒指出了MirrorCode评估方法与实际软件开发之间的差异。虽然该基准测试提供了有价值的AI能力证据，但如何将这种能力转化为实际开发环境中的表现仍是一个开放问题，这对AI在真实世界软件工程中的应用提出了挑战。
  
  benchmarking software-development ai-applications
Visit annotations in context

Tags

software-development

ai-applications

benchmarking

Annotators

fxp007

URL

epoch.ai/blog/mirrorcode-preliminary-results
Jan 2026
arxiv.org arxiv.org

Evaluation and Benchmarking of LLM Agents: A Survey

1
1. omarknazir 24 Jan 2026
  
  in Public
  
  the 𝜏-benchmark [ 104] explicitly incorporates the pass^𝑘 metric toevaluate the consistency of an agent
  
  reliability and consistency paper comparision
  
  query benchmarking
Visit annotations in context

Tags

query

benchmarking

Annotators

omarknazir

URL

arxiv.org/pdf/2507.21504v1
Sep 2024
github.com github.com

testdouble/time_up: ⏱ Create and manage multiple timers to tell where your Ruby code's time is going

1
1. TylerRick 04 Sep 2024
  
  in Public
  
  profiling (computing) profiling tools benchmarking timer Ruby
Visit annotations in context

Tags

profiling tools

timer

Ruby

profiling (computing)

benchmarking

Annotators

TylerRick

URL

github.com/testdouble/time_up
Dec 2023
superfastpython.com superfastpython.com

5 Ways to Measure Execution Time in Python - Super Fast Python

3
1. GadjiMurad 18 Dec 2023
  
  in Public
  
  It is critical to be systematic when benchmarking code.
  
  The first step is to record how long an unmodified version of the program takes to run. This provides a baseline in performance to which all other versions of the program must be compared. If we are adding concurrency, then the unmodified version of the program will typically perform tasks sequentially, e.g. one-by-one.
  
  The performance of the modified versions of the program must have better performance than the unmodified version of the program. If they do not, they are not improvements and should not be adopted.
  
  tips benchmarking
2. GadjiMurad 18 Dec 2023
  
  in Public
  
  Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.
  
  benchmarking definition
3. GadjiMurad 18 Dec 2023
  
  in Public
  
  Benchmarking Python code refers to comparing the performance of one program to variations of the program.
  
  benchmarking definition
Visit annotations in context

Tags

definition

tips

benchmarking

Annotators

GadjiMurad

URL

superfastpython.com/benchmark-execution-time/
Nov 2022
docs.google.com docs.google.com

Devising ML Metrics

1
1. carlhenrikrolf 11 Nov 2022
  
  in Public
  
  Devising ML Metrics
  
  benchmarking environment suites
Visit annotations in context

Tags

environment suites

benchmarking

Annotators

carlhenrikrolf

URL

docs.google.com/document/d/1Pesm4eDQKK96ZMsmzbLqfb-VvmisikJrZ7NhygS-QTU/edit
Sep 2022
rbspy.github.io rbspy.github.io

Benchmarking your code - rbspy: A Sampling CPU Profiler for Ruby

1
1. TylerRick 12 Sep 2022
  
  in Public
  
  a benchmark tells you how slow your code is ("it took 20 seconds to do X Y Z") and a profiler tells you why it's slow ("35% of that time was spent doing compression").
  
  difference simple explanation benchmarking profiling (computing)
Visit annotations in context

Tags

difference

benchmarking

profiling (computing)

simple explanation

Annotators

TylerRick

URL

rbspy.github.io/profiling-guide/benchmarking-your-code.html
May 2022
github.com github.com

Add config.around(:all) // current behaviour is confusing, and the same as around(:each) · Issue #1031 · rspec/rspec-core

1
1. TylerRick 27 May 2022
  
  in Public
  
  before(:all) do @fiber = Fiber.new do Benchmark.ips do |benchmark| @benchmark = benchmark Fiber.yield benchmark.compare! end end @fiber.resume end
  
  ruby: fibers benchmarking
Visit annotations in context

Tags

benchmarking

ruby: fibers

Annotators

TylerRick

URL

github.com/rspec/rspec-core/issues/1031
Dec 2020
psyarxiv.com psyarxiv.com

Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction

1
1. marta_radosevic 01 Dec 2020
  
  in BehSci
  
  Rocca, R., & Yarkoni, T. (2020). Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction. PsyArXiv. https://doi.org/10.31234/osf.io/e437b
  
  is:preprint lang:en psychology model evaluation benchmarking machine learning reliability modelling utility prediction
Visit annotations in context

Tags

model evaluation

machine learning

prediction

reliability

utility

lang:en

is:preprint

modelling

benchmarking

psychology

Annotators

marta_radosevic

URL

psyarxiv.com/e437b/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL