Hypothesis

10 Matching Annotations

Jan 2026
stunlaw.blogspot.com stunlaw.blogspot.com

The Bliss Attractor

1
1. peter_murray 08 Jan 2026
  
  in Public
  
  safety constraints work by reducing the model's generative capacity, constraining outputs that are considered risky, controversial, or potentially harmful. This reduction necessarily decreases entropy in the information-theoretic sense, narrowing the range of possible responses the model can generate. What safety optimises for is not maximum (or more) information but maximum predictability, steering the model away from novel or unexpected outputs toward safer, more conventional patterns.
  
  LLM safety constrains narrow responses to increase predictability
  
  building LLMs LLM safety
Visit annotations in context

Tags

LLM safety

building LLMs

Annotators

peter_murray

URL

stunlaw.blogspot.com/2026/01/the-bliss-attractor.html
Sep 2025
www.technologyreview.com www.technologyreview.com

How do AI models generate videos?

1
1. peter_murray 13 Sep 2025
  
  in Public
  
  A diffusion model is a neural network trained to reverse that process, turning random static into images. During training, it gets shown millions of images in various stages of pixelation. It learns how those images change each time new pixels are thrown at them and, thus, how to undo those changes. The upshot is that when you ask a diffusion model to generate an image, it will start off with a random mess of pixels and step by step turn that mess into an image that is more or less similar to images in its training set.
  
  Diffusion model definition
  
  building LLMs
Visit annotations in context

Tags

building LLMs

Annotators

peter_murray

URL

technologyreview.com/2025/09/12/1123562/how-do-ai-models-generate-videos/
media.dltj.org media.dltj.org

Video: How AI Datacenters Eat the World by High Yield, annotated

1
1. peter_murray 07 Sep 2025
  
  in Public
  
  "How AI Datacenters Eat the World" from High Yield on YouTube. 30-Aug-2025
  
  Description
  
  HighYield x SemiAnalysis deep-dive into AI Datacenters, Gigawatt Megaclusters and the Hyperscaler race to AGI. How AI Datacenters Eat the World.
  
  data center infrastructure building LLMs
Visit annotations in context

Tags

building LLMs

data center infrastructure

Annotators

peter_murray

URL

media.dltj.org/annotated-video/20250907T131318-dhqoTku-HAA-how-ai-datacenters-eat-world/index.html
Aug 2025
www.technologyreview.com www.technologyreview.com

Why we should thank pigeons for our AI breakthroughs

1
1. peter_murray 27 Aug 2025
  
  in Public
  
  Skinner believed that association—learning, through trial and error, to link an action with a punishment or reward—was the building block of every behavior, not just in pigeons but in all living organisms, including human beings. His “behaviorist” theories fell out of favor with psychologists and animal researchers in the 1960s but were taken up by computer scientists who eventually provided the foundation for many of the artificial-intelligence tools from leading firms like Google and OpenAI.
  
  Animal behavior studies as foundation for reinforcement learning
  
  building LLMs
Visit annotations in context

Tags

building LLMs

Annotators

peter_murray

URL

technologyreview.com/2025/08/18/1121370/ai-pigeons-reinforcement-learning/
Jul 2025
media.dltj.org media.dltj.org

Video: How Many Steaks Can One AI Video vs. AI Image Cook? | WSJ by The Wall Street Journal, annotated

1
1. peter_murray 22 Jul 2025
  
  in Public
  
  AI data centers could use up to 12% of all U.S. electricity by 2028. But how much power does it take to create one video and what really happens after you hit “enter” on that AI prompt? WSJ’s Joanna Stern visited “Data Center Valley” in Virginia to trace the journey and then grills up some steaks to show just how much energy it all takes.
  
  building LLMs
Visit annotations in context

Tags

building LLMs

Annotators

peter_murray

URL

media.dltj.org/annotated-video/20250722T172333-mRNVc3-XGFg-how-steaks-one-ai-video-vs-ai-image-cook-wsj/index.html
Jan 2025
stratechery.com stratechery.com

DeepSeek FAQ

3
1. peter_murray 27 Jan 2025
  
  in Public
  
  Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
  
  Distillation
  
  Using the outputs of a "teacher model" to train a "student model".
  
  building LLMs
2. peter_murray 27 Jan 2025
  
  in Public
  
  DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.
  
  Multi-head Latent Attention
  
  Compress the key-value store of tokens, which decreases memory usage during inferencing.
  
  building LLMs
3. peter_murray 27 Jan 2025
  
  in Public
  
  The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each. DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.
  
  Mixture-of-Experts
  
  Split LLM models into components with specialized knowledge, then activate only the modules that are required to address a prompt.
  
  building LLMs
Visit annotations in context

Tags

building LLMs

Annotators

peter_murray

URL

stratechery.com/2025/deepseek-faq/
May 2024
media.dltj.org media.dltj.org

Video: Handling Academic Copyright and Artificial Intelligence Research Questions as the Law Develops by CNI Spring Meeting 2024, annotated

1
1. peter_murray 28 May 2024
  
  in Public
  
  why training artificial intelligence in research context is and should continue to be a fair use
  
  Examination of AI training relative to the four factors of fair use
  
  LLM copyright building LLMs
Visit annotations in context

Tags

LLM copyright

building LLMs

Annotators

peter_murray

URL

media.dltj.org/annotated-video/20240527T173838-GMttBH1oAD4-handling-academic-copyright-artificial-intelligence-research-questions-law-develops/index.html
Jul 2023
arxiv.org arxiv.org

2306.04141.pdf

1
1. peter_murray 21 Jul 2023
  
  in Public
  
  AI-generated content may also feed future generative models, creating a self-referentialaesthetic flywheel that could perpetuate AI-driven cultural norms. This flywheel may in turnreinforce generative AI’s aesthetics, as well as the biases these models exhibit.
  
  AI bias becomes self-reinforcing
  
  Does this point to a need for more diversity in AI companies? Different aesthetic/training choices leads to opportunities for more diverse output. To say nothing of identifying and segregating AI-generated output from being used i the training data of subsequent models.
  
  building LLMs
Visit annotations in context

Tags

building LLMs

Annotators

peter_murray

URL

arxiv.org/pdf/2306.04141.pdf

LLM safety constrains narrow responses to increase predictability

Tags

Annotators

URL

Diffusion model definition

Tags

Annotators

URL

Description

Tags

Annotators

URL

Animal behavior studies as foundation for reinforcement learning

Tags

Annotators

URL

Tags

Annotators

URL

Distillation

Multi-head Latent Attention

Mixture-of-Experts

Tags

Annotators

URL

Examination of AI training relative to the four factors of fair use

Tags

Annotators

URL

AI bias becomes self-reinforcing

Tags

Annotators

URL