Hypothesis

2 Matching Annotations

Jun 2026
www.latent.space www.latent.space

https://www.latent.space/p/ainews-frontiercode-benchmarking

1
1. fxp007 09 Jun 2026
  
  in Public
  
  Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?
  
  大多数人认为AI代码评估应该关注功能正确性，但作者认为我们应该评估代码是否真正可合并，这挑战了传统基准测试的共识。FrontierCode引入了'可合并性'这一新标准，关注代码质量而非仅通过测试，这是一个反直觉的转变。
  
  non-consensus code-evaluation benchmarking
Visit annotations in context

Tags

benchmarking

non-consensus

code-evaluation

Annotators

fxp007

URL

latent.space/p/ainews-frontiercode-benchmarking
Jul 2020
science.sciencemag.org science.sciencemag.org

Call for transparency of COVID-19 models

1
1. edampf 24 Jul 2020
  
  in BehSci
  
  Barton, C. M., Alberti, M., Ames, D., Atkinson, J.-A., Bales, J., Burke, E., Chen, M., Diallo, S. Y., Earn, D. J. D., Fath, B., Feng, Z., Gibbons, C., Hammond, R., Heffernan, J., Houser, H., Hovmand, P. S., Kopainsky, B., Mabry, P. L., Mair, C., … Tucker, G. (2020). Call for transparency of COVID-19 models. Science, 368(6490), 482.2-483. https://doi.org/10.1126/science.abb8637
  
  is:article letter COVID-19 lang:en transparency modeling knowledge sharing data sharing science research prediction response government policy science decision making healthcare economy code sharing replication evaluation rapid response publication
Visit annotations in context

Tags

is:article

lang:en

data sharing

healthcare

response

COVID-19

prediction

policy science

decision making

replication

knowledge sharing

transparency

science

evaluation

publication

letter

research

government

modeling

code sharing

economy

rapid response

Annotators

edampf

URL

science.sciencemag.org/content/368/6490/482.2.full

Tags

Annotators

URL

Tags

Annotators

URL