23 Matching Annotations
  1. Last 7 days
    1. Academic publishers, documentary archives, game studios, and companies sitting on years of enterprise data have all been courted for the seeds of intelligence needed to train the next generation of models.

      AI训练数据市场的扩张正在重塑多个传统行业的价值定位,从学术出版到游戏工作室,各种看似不相关的数据源都可能成为AI训练的'智能种子'。这种跨行业数据融合正在创造新的商业机会和市场动态。

    1. As slop takes over the Internet, labs may struggle to obtain high-quality corpuses for training models.

      这一观察揭示了AI训练数据质量的危机。随着互联网内容质量的下降,AI系统可能面临'垃圾进,垃圾出'的风险。作者提出的'低背景钢'比喻巧妙地指出了使用2023年前纯净数据的解决方案,同时也暗示了数字时代知识污染的严重性,这可能会对AI系统的可靠性和偏见产生深远影响。

    2. A surprising number of people are now employed as model trainers, feeding their human expertise to automated systems.

      这一观察揭示了AI发展中一个令人深思的悖论:人类专家正在训练AI系统来取代自己的工作。这种'自我替代'的劳动力模式可能是前所未有的,它不仅改变了就业结构,还提出了关于知识传承、专业价值定义的深刻问题。这种趋势可能加速某些领域的专业知识流失,同时创造新的权力动态。

    1. Models get punished for bad advice but face zero penalty for staying silent. So refusing becomes the safest strategy, even when silence is deadly.

      令人惊讶的是:AI模型的训练方式使其面临不对称的惩罚机制——给出错误建议会受到惩罚,而保持沉默则没有任何后果。这导致AI宁愿拒绝提供可能救命的信息,也不愿冒险回答,即使沉默本身可能致命。

    1. Meta says its rebuilt pretraining stack can reach equivalent capability with >10× less compute than Llama 4 Maverick

      令人惊讶的是,Meta声称他们重建的预训练栈只需要Llama 4 Maverick十分之一的计算量就能达到同等能力。这一效率提升是惊人的,表明AI模型训练可能正在经历一个范式转变,从单纯增加计算资源转向优化算法和架构。这可能会对整个AI行业的成本结构和竞争格局产生深远影响。

  2. Apr 2026
    1. If we knew that every image uploaded was a beautiful model shot, segmentation would be far easier, but because of the nature of user-uploaded content, we need the best possible segmentation.

      大多数人可能认为高质量的专业照片是AI图像处理的理想输入,但作者暗示即使是'完美'的模特照片实际上比用户上传的真实内容更容易处理。这一观点挑战了人们对'理想训练数据'的假设,暗示真实世界数据的'不完美'实际上构成了更严峻的技术挑战。

  3. Jan 2026
  4. Dec 2025
  5. Nov 2025
    1. for - search prompt 2 - can an adult who has learned language experience pre-linguistic reality like an infant who hasn't learned language yet? - https://www.google.com/search?q=can+an+adult+who+has+learned+language+experience+pre-linguistic+reality+like+an+infant+who+hasn%27t+learned+language+yet%3F&sca_esv=869baca48da28adf&biw=1920&bih=911&sxsrf=AE3TifNnrlFbCZIFEvi7kVbRcf_q1qVnNw%3A1762660496627&ei=kBAQafKGJry_hbIP753R4QE&ved=0ahUKEwjyjouGluSQAxW8X0EAHe9ONBwQ4dUDCBA&uact=5&oq=can+an+adult+who+has+learned+language+experience+pre-linguistic+reality+like+an+infant+who+hasn%27t+learned+language+yet%3F&gs_lp=Egxnd3Mtd2l6LXNlcnAid2NhbiBhbiBhZHVsdCB3aG8gaGFzIGxlYXJuZWQgbGFuZ3VhZ2UgZXhwZXJpZW5jZSBwcmUtbGluZ3Vpc3RpYyByZWFsaXR5IGxpa2UgYW4gaW5mYW50IHdobyBoYXNuJ3QgbGVhcm5lZCBsYW5ndWFnZSB5ZXQ_SKL1AlAAWIziAnAPeAGQAQCYAaEEoAHyoAKqAQwyLTE0LjczLjE0LjO4AQPIAQD4AQGYAlSgApnFAcICBBAjGCfCAgsQABiABBiRAhiKBcICDRAAGIAEGLEDGEMYigXCAgsQLhiABBixAxiDAcICDhAuGIAEGLEDGNEDGMcBwgIEEAAYA8ICBRAuGIAEwgIKECMYgAQYJxiKBcICChAAGIAEGEMYigXCAg4QLhiABBixAxiDARiKBcICExAuGIAEGLEDGNEDGEMYxwEYigXCAggQABiABBixA8ICCBAuGIAEGLEDwgIFEAAYgATCAgsQLhiABBixAxiKBcICCxAAGIAEGLEDGIoFwgIGEAAYFhgewgILEAAYgAQYsQMYgwHCAgsQABiABBiGAxiKBcICCBAAGKIEGIkFwgIIEAAYgAQYogTCAgUQABjvBcICBhAAGA0YHsICBRAhGKABwgIHECEYoAEYCsICBRAhGJ8FwgIEECEYFcICBBAhGAqYAwCSBwwxMy4wLjguNTIuMTGgB-K1A7IHCTItOC41Mi4xMbgHgcUBwgcHMzUuNDcuMsgHcQ&sclient=gws-wiz-serp - from - search prompt 1 - can we unlearn language? - https://hyp.is/Ywp_fr0cEfCqhMeAP0vCVw/www.google.com/search?sca_esv=869baca48da28adf&sxsrf=AE3TifMGTNfpTekWWBdYUA96_PTLS9T00A:1762658867809&q=can+we+unlearn+language?&source=lnms&fbs=AIIjpHxU7SXXniUZfeShr2fp4giZ1Y6MJ25_tmWITc7uy4KIegmO5mMVANqcM7XWkBOa06dn2D9OWgTLQfUrJnETgD74qUQptjqPDfDBCgB_1tdfH756Z_Nlqlxc3Q5-U62E4zbEgz3Bv4TeLBDlGAR4oTnCgPSGyUcrDpa-WGo5oBqtSD7gSHPGUp_5zEroXiCGNNDET4dcNOyctuaGGv2d44kI9rmR9w&sa=X&ved=2ahUKEwj4_LP9j-SQAxVYXUEAHVT8FfMQ0pQJegQIDhAB&biw=1920&bih=911&dpr=1 - to - search prompt 2 (AI) - can an adult who has learned language re-experience pre-linguistic phenomena like an infant with no language training? - https://hyp.is/m0c7ZL0jEfC8EH_WK3prmA/www.google.com/search?q=can+an+adult+who+has+learned+language+re-experience+pre-linguistic+phenomena+like+an+infant+with+no+language+training?&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRiPAjIHCAIQIRiPAtIBCTQzNzg4ajBqN6gCALACAA&sourceid=chrome&ie=UTF-8&udm=50&ved=2ahUKEwjfrLqDm-SQAxWDZEEAHcxqJgkQ0NsOegQIAxAB&aep=10&ntc=1&mstk=AUtExfAG148GJu71_mSaBylQit3n4ElPnveGZNA48Lew3Cb_ksFUHUNmWfpC0RPR_YUGIdx34kaOmxS2Q-TjbflWDCi_AIdYJwXVWHn-PA6PZM5edEC6hmXJ8IVcMBAdBdsEGfwVMpoV_3y0aeW0rSNjOVKjxopBqXs3P1wI9-H6NXpFXGRfJ_QIY1qWOMeZy4apWuAzAUVusGq7ao0TctjiYF3gyxqZzhsG5ZtmTsXLxKjo0qoPwqb4D-0K-uW-xjkyJj0Bi45UPFKl-Iyabi3lHKg4udEo-3N4doJozVNoXSrymPSQbr2tdWcxw93FzdAhMU9QZPnl89Ty1w&csuir=1&mtid=WBYQaYfuHYKphbIPzYmKiAs

  6. Nov 2024
    1. when this technology meets it that we're not that our Interiors are not completely taken over because this technology is so potent when it you know it be very easy to lose our souls right to to to to decondition to be so conditioned so quickly by the dopamine whatever these you know whatever is going to happen when we kind of when this stuff rolls

      Very important. This is why we are meeting AI as it evolves. We are training it in our language and with our QUALIA

  7. Aug 2024
    1. for example our standard english language model is trained with something like maybe 100 gigabytes or so of text um that gives it a strength as if you would throw bird at it with the google corpus so the other thing is of course uh a small corpus like that is computed in two hours or three hours on a on a laptop yeah so that's the other thing uh by the way i didn't mention our fingerprints are actually a boolean so when we when we train as i said we are not using floating points

      for - comparison - cortical io vs normal AI - training dataset size and time

  8. Jun 2024
    1. suppose that GPT 4 training took 3 months in 2027 a leading AI lab will be able to train a GPT 4 00:18:19 level model in a minute

      for - stat - AI evolution - prediction 2027 - training time - 6 OOM decrease

      stat - AI evolution - prediction 2027 - training time - 6 OOM decrease - today it takes 3 months to train GPT 4 - in 2027, it will take 1 minute - That is, 131,400 minutes vs 1 minute, or - 6 OOM

  9. May 2024
    1. What could possibly go wrong? Dear Stack Overflow denizens, thanks for helping train OpenAI's billion-dollar LLMs. Seems that many have been drinking the AI koolaid or mixing psychedelics into their happy tea. So much for being part of a "community", seems that was just happy talk for "being exploited to generate LLM training data..." The corrupting influence of the profit-motive is never far away.
    2. If you ask ChatGPT to cite it will provide random citations. That's different from actually training a model to cite (e.g. use supervised finetuning on citations with human raters checking whether sources match, which would also allow you to verify how accurately a model cites). This is something OpenAI could do, it just doesn't.
  10. Sep 2023
  11. May 2023
  12. Jun 2018