AI-queryable transcript
Once we get consent, link the Wellbeing workshop transcript as an example. OK link it here -- it's at https://uj-wellbeing-workshop.netlify.app/transcript.html
AI-queryable transcript
Once we get consent, link the Wellbeing workshop transcript as an example. OK link it here -- it's at https://uj-wellbeing-workshop.netlify.app/transcript.html
The workshop cruxes map to specific subquestions that feed into CM_01: CM_12: Will most CM be produced using hydrolysates (replacing growth factors) by 2036? CM_14: What will cell media cost per kg of CM output in 2036? CM_16: What cell density is achievable in a 20,000-liter bioreactor by 2036? CM_20: What share of companies will build their own bioreactors by 2036?
make this a folding box.
What will be the average production cost (per edible kg) of cultured chicken meat in 2031, 2036, and 2051, across all large-scale plants in the world?
focus on 2036 only here #implement
Corporate campaigns for cage-free commitments have clear, measurable impacts.
OK but an attributed tooltip quote would also be good #implement
GFI's recent work
tooltip reference and link with a direct quote
Pasitka
give tooltip links and direct quotes, with dates and paper names #implement -- important
See our EA Forum post for further discussion.
"July 2025 EA Forum post' -- make this a tooltip #implement
spare billions of animals. I
Tooltip here about how we note there may be a range of other important benefits and perhaps costs here, including environmental benefits, reduction of animal-to-human disease vectors, etc., but we are mainly focused on animal welfare for this conference.
We're organizing the discussion around four key questions:
These are four formulations, but many of the cruxes run across each of these. There's a lot of overlap here.
Cruxes: Things like the costs of the baseline media (linked to the cell density), cost of the growth factors (will they be the expensive ones or more like hydrolysate-based ones, scale/tank/production size, cost of capital, and TEA/forecasting methodological choices.
(Consumer and government acceptance is also very likely a crux, but we're focusing on that a bit less).
We're now commissioning evaluations of key TEAs,
Not quite -- these will come out of the PQ project -- see the 'request to evaluators' here: https://coda.io/d/Unjournal-Public-Pages_ddIEzDONWdb/Cultured-meat-PQ-Request-to-TEA-evaluators_suyucPLB#_luoC2X-O
pessimistic
Add a link here: "See our EA Forum post for further discussion." (hyperlink https://forum.effectivealtruism.org/posts/7ha23d3qzXiCqYLDq/is-cultured-meat-commercially-viable-unjournal-s-first)
The core question is deceptively simple: What will cultivated meat cost to produce? If costs fall dramatically, CM could displace a substantial share of conventional meat production and spare billions of animals. If costs remain high, funding CM development may have been a poor use of limited animal welfare resources compared to proven interventions.
This is a bit too simple -- see other discussion pages. For the 'what to fund' question, we need to consider the marginal benefit of funding on the probability and magnitude of success in fostering CM (sooner) and displacing animal products and animal suffering. This is discussed too much more detail in the specific PQ definitions, motivations and resources.
But "CM is plausibly able to achieve near price parity" seems highly correlated or causally entangled with "funding CM development (and supporting it politically) is likely to have high AW impact per dollar". IN particular, if it seems practically impossible for CM to ever get close to near parity. Then it seems unlikely that the CM project will be successful and thus a near guarantee that additional funding will have little impact.
But we should note or at least footnote that that's more of a necessary than a sufficient condition. CM funding could have a low impact/$ for other reasons, e.g., if, on the other hand CM is likely to be successful soon irrespective of this funding.
achieves
if it achieves --> particular if it comes close to
Will most cultured meat (by volume) be produced using hydrolysates as a replacement for expensive purified growth factors in 2036?
add discussion boxes here, so they can comment if desired
How many chickens will be slaughtered for meat globally in 2032 and 2052?
skip this one ... too indirect #implement
What is the expected-value (and probability distribution) of the impact on animal welfare from funding CM development? Consider marginal funding, very high funding levels, or impact relative to the best alternative interventions.
Give an (optional) slider for them to state what share of benefit, relative to the next best intervention, is achieved, along with 80% CIs
What will be the average production cost (per edible kg) of cultured chicken meat in 2031, 2036, and 2051, across all large-scale plants in the world?
Just do 2036 ... keep it simple here #implement
key
remove bold #implement ... use less bolding in general, especially avoid it mid-sentences
Which Potential Segments Interest You?
Add some other potential segments, perhaps
"business, government, and philanthropic environment and the cost of capital."
"How is cultured meat produced -- a cost-focused background overview"
"Constructing TEAs, uncertainty modeling, and forecasting" -- hands on modeling (post-workshop hack session, 2-5 hours)
comment
Still add your annotations and let me know what you think about hypothesis as a format for collaborative discussion.
key levers
The high cell density is in blue, but you also put "micros" in blue, which suggests the two have a link. I don't think that's what it is. I think the high cell density will reduce the media cost, which is in green, and maybe other goals like bioreactor and operating expenses so I'm a bit confused.
diagram
Diagram says "fixed OPEX". But wouldn't t least some of the labor costs depend on scale here?
Typical Cost Breakdown ($/kg chicken)
Diagram below does not really make sense. Is it a breakdown of the cost components or something having to do with levers that could make the costs go up or down substantially? This needs more clarity.
growth factors
Hyperlink is not working so well
typical
A tooltip should define what is meant by "typical here." Probably depends on the outcome of many model simulations and the central probability mass or something
How Cultured Chicken is Made Code
Top of this, or maybe on another page, it would be nice to have some sort of mosaic graph with different cost break-downs for different scenarios. Dividing up the cost into different components to get to total cost per kilogram, and then perhaps each of those mosaic elements could link to a different section explaining it.
This reasoning underlies our model’s binary switch approach —
This part of the model and also define what you mean by "binary switch" specifically in a tooltip.
insulin and transferrin
I don't think these were mentioned anywhere either - Dash. Are these growth factors? If they're not growth factors, why are you discussing them in this section?
albumin
What is albumin? You did not mention it anywhere else in this document. Is it a growth factor?
Solutions Being Developed
Which of the growth factors do these "solutions" pertain to?
Market data
Where did you find the market data ?
Current Price Target Price
It doesn't really matter what the price is per gram of inpuy . The question is, what is the likely price per kilogram of chicken meat output. add a column for this.
The diagram shows how growth factors work: a growth factor protein (GF) binds to a receptor on the cell membrane, triggering a chain of signals inside the cell that ultimately tells the cell to “PROLIFERATE” (divide).
Not sure how this diagram is helpful at all. Maybe put this in a folding box for now.
Most companies scaling up have already adopted hydrolysate-based media.
Tooltip example here, please.
- Less media per kg of meat (biggest savings) - Fewer reactor transfers in seed train - Smaller bioreactors needed for same output - Lower labor per kg
The itemized list is not rendering correctly here.
most companies
Give listings, examples, and links/citations with tooltip quotes.
Key Growth Factors for Cultured Meat
Why are you telling me about all these different kinds of growth factors? Do they all need to be used? Are they alternatives to each other? Have you defined what the terms in the "function" column mean?
And how much of them will need to be used per kilogram of chicken meat produced (or whatever weight we are standardizing things to here), what cost implications? Right to always bring things to this standard unit of cost per kilogram of chicken meat.
Cost ($/L)
How does dollars per liter map into dollars per whatever unit of chicken meat we're using here? It's going to depend on the cellular density. I presume the cellular density is the same for these two types of media, or does one lead to much less dense cells?
Hydrolysates: The Big Win for Amino Acids
To what extent is it clear that these can just simply be used, and to what extent is this still an important uncertainty? If it's clear that they can be used, We should make that clear-- to flag this so people don't think of it as still an important uncertainty. But we should look for more references here to be sure.
The cultured meat industry must use serum-free media.
Try not to state things in a very prescriptive way. We're meant to be providing background information, not ordering people around.
This is THE Pivotal Uncertainty (click to expand)
Again, this really just seems too strong a statement to make. We need a little bit more epistemic modesty and reasoning transparency.
serum-free media d
Another hovering "?" here
Global FBS supply: ~500,000 L/year
Where exactly does it come from though?
expand
Maybe rewrite the headline to actually say that the total media cost is predicted to be out 40 to 70% of production cost, or whatever the numbers tell us. I don't need to expand it to get the headline result.
0-70% of total production cost (Humbird 2021, Risner et al. 2021).
Any way you could cite more recent references?
<5% if breakthrough technologies
Which ones give an explanation, maybe in a tooltip?
research scale
But why focus at research scale? Could you give what is projected to be a production scale?
This is the key uncertainty.
That seems a little bit too much of a conclusion. Can you state this a little bit in a more reasoning-transparent way
Ethical: Derived from fetal calves — defeats purpose of avoiding animal slaughter Limited supply: ~500,000 L/year globally (van der Valk et al. 2018)
Where does the supply come from? Are animals being killed here to produce it?
This might be a folding box. I'm not sure if it enters into the previous narrative. ?
Traditional cell culture uses fetal bovine serum (FBS) — a complex mixture that provides growth factors, hormones, and attachment proteins. Problems:
So which of the above is this used for? It seems like it covers several of the above things you're calling "media". That's a little bit confusing to have this overlap of some sort.
grade (Sigma-Aldrich pricing)
Give me some excerpts from that page and explain what it means. You just linked to a sort of commercial page. It's not very helpful or easy to navigate.
Traditional cell culture uses fetal bovine serum (FBS)
A question mark comes up when I hover over this, but I don't see any tooltip explaining what it is.
Also, why are you talking about bovine serum if we're thinking about chicken here? At least you should explain the analogy.
Moderate
Quantify this.
Vitamins Metabolic cofactors B-complex, etc. Minerals/salts Osmotic balance, enzyme function
Maybe group the cheapest things together in one row unless there's some sort of environmental or ethical issue with them.
FGF
Tool tips should explain what each of these things are.
costs
Not all costs, just this component of cost. Again, I want to know what share that makes up of the total to put this in perspective. It's only a minor share of total cost. It's not really a pivotal cost driver, is it?
Try to put these in terms of cost per unit of meat produced in a mature production process, and try to use the same units everywhere so we know how to compare each element and sub-element.
Cell culture media contains everything cells need to grow:
List these by order of estimated share of cost in a production-scale process. And give a rough estimate of those shares, and those should be on the same scale - expressed per unit of output, in the same units. Give a disclaimer, of course, that this is just based on one particular estimate, and you can link to the actual model.
Step 4: Media Composition
Wait, that's not precisely a "step". Isn't the media used within the production bioreactors? It's not sequential. So perhaps the word "step" is confusing unless you're talking about a modeling step or something.
To what extent are hydrolysates already being used? Let's get some more sources for each of these. The costs are really important here.
Hydrolysates vs. pure amino acids
What share of media costs are these in different models and reports? I thought this was possibly the largest?
Cell densities of 100-200 g/L have been demonstrated in perfusion systems (Clincke et al. 2013).
Rather old reference, aren't there more recent ones, perhaps with much higher cell densities?
Turnover = 1: Batch mode (same media throughout) Turnover = 5-10: Perfusion (replace media multiple times)
Sort of lost me here. I don't know what the term you're talking about is and why it's important. What does the word 'perfusion' actually mean?
media turnover parameter in our model
Link this part of the model. Backlinks might also be good from the model to this explanation (here and everywhere else. )
sio
Batch versus perfusion? You haven't given enough narrative here. I don't know why you're telling me this. Are these different bioreactor types, and if so, how does it map into the categories you just gave above?
Simplified designs for food production
This is a little confusing to me because what do you mean designed for food production? What is the standard food production use of this if not for cultured meat?
Batch vs. Perfusion
These images below are also too small.
This is a pivotal cost driver.
Wait, give us a perspective on what share of the total cost this is approximately, with some ranges considering a potential producer operating at scale.
Source
These sources are all from 2021 or earlier - can't you find some updated cost figures?
Bioreactor Types
Link some pictures of these types of tanks, or perhaps a folding box showing these pictures. You'll have to do a web search to look these things up.
perfusion or batch feeding
What is media perfusion and what is batch feeding ?
Production-scale bioreactors
What makes a bioreactor "production scale"?
the entire batch ($100K-$1M loss)
The hyperlink works, but I want a tooltip giving the exact quote that provides evidence for this claim - both the contamination and the size of the loss. ... I want this sort of documentation for all claims in general, try to use tooltips to avoid clutter.
ou need far fewer reactor transfers
What's the typical cost of the reactor transfer in an established, larger-scale production process? Would this still be a substantial share of total costs?
less total media.
But isn't the sort of media that achieves higher density more expensive? I'm actually uncertain about this.
often pharma-grade at $5-20/L
Source for the quote "often pharma grade?" Okay, you're relying heavily on Humbird here. Find some other sources, and I've heard that now most companies are using food grade instead of pharma grade. Look into that and discuss in tooltip footnotes.
fully loaded
What do you mean by "fully loaded" here?
Cost Impact
For each phase, I want you to give some indication of the share of costs, in terms of the total cost per unit of meat, that this could potentially encompass, both at a small scale and at a larger scale.
Seed Train: Progressive Scale-Up Vial 1 mL 10⁶ cells T-Flask 100 mL 10⁷ cells Spinner 1 L 10⁸ cells Small Reactor 10 L 10⁹ cells Medium Reactor 100 L 10¹⁰ cells Production 1,000+ L 10¹¹+ cells
The text is a bit crowded here, so the numbers overlap the words. Try to adjust to give it a little more space.
Step 1: Cell Banking What Happens
Give more continuous references, perhaps as tooltips, to where you are getting this information from about the process. Perhaps give citations with links and short quotes.
require regulatory approval.
Link to this regulatory approval thing - how difficult/Costly is it to get that approval, or do we already have this for the important immortalized cell lines?
one-time setup cost that’s amortized over many production runs. A well-characterized cell bank can support years of production (GFI 2021).
Doesn't really explain how the costs work. Ultimately, the banked cells are used up, correct? Are you saying that cell banking is just a tiny share of the cost here, if you end up using the whole batch, is that right?
1. Biopsy
Image is still a bit too small.
2. Isolate Cells
Are we trying to isolate and why? Maybe this is a tool tip? And what's enzymatic digestion?
Step 1: Cell Banking
You did not use the term "cell banking" in the flow chart above. This can be confusing when you change terms. We don't know what Maps to what
Complex differentiation protocols
What are differentiation protocols, and why is their complexity a challenge? Maybe put this in a tooltip
pluripotent
What does the word pluripotent mean? Tooltip, please.
Pasitka et al. 2022
Give the name of the paper and a tooltip, and also explain what aspects of these claims the source provides, perhaps with quick quotes.
Similar FGF-2/IGF-1 requirements to bovine (~10-100 ng/mL optimal)
Explain, perhaps in a tooltip, why the similarity is helpful here. That I don't really know what these things mean (e.g., what does ng mean?)
~70 billion chickens slaughtered annually vs ~300 million cattle
Provide a tooltip/link to discussion from animal welfare advocates about this, perhaps on the EA forum.
Kim et al. 2024
Tooltip hover should show the name of the paper, et cetera.
Produc
Can you make an image without a bone in it that still looks like a piece of chicken meat? I don't think bones are happening any time soon in cultured meat.
Seed Train
What's a seed train?
Step 5: Growth Factors — The Pivotal Challenge
Seems too strong. Media costs exceed GF costs in many formulations I've seen.
This is THE pivotal uncertainty. If any of these approaches succeeds at scale, growth factors become negligible (<$1/kg chicken). If none succeed, growth factors could be >$100/kg — making cultured meat uneconomic at scale. See GFI’s analysis for detailed technical roadmaps.
this seems a bit too strong from my reading. Media costs exceed GF costs in many formulations
Decision RelevanceUnderstanding the nuances of poverty traps and 'trappedness' can inform development policies and interventions aimed at poverty alleviation. This paper could provide insights into where resources and policy changes would be most effective globally.
This feels a bit vague to me. Are there specific policies that would be affected?
DRAFT — This survey will be available after the workshop takes place. Questions?
Make this less prominent -- no 'header' until after the workshop
Can reverse cross-population comparisons.
remember -- we are not focused on cross-population comparisons for this workshop. It's more about 'which interventions yield greater welfare', which would generally involve differences in difference, ideally across comparable populations (but not always)
δ = discount factor for future years
Where did the discount and time factor come from? Where did these definitional equations come from? I didn't think most emply estimated WELLBY measures considered multi-year collection or impact. And are they really discounting?
what most intervention comparisons need)
Cut this. I don't think it necessarily holds -- a lot of interventions impact mortality.
Add to footnote -- the 'incremental' WELLBYs may be captured by observing differences between comparable treated and untreated populations.
Accounting
it's not really 'accounting' -- these are conceptual
UK Government: Official guidance for policy appraisal
A link to this would be helpful. The "Green Book". (I wonder -- how impactful has this actually been on British policy?)
Neutral point estimation: What is the actual neutral point on the 0-10 scale for different populations? How stable is it across contexts?
I suspect we don't have any good measures of this? There's the Peasgood paper but I don't think that was in a LMIC and I'm not sure how much it has been vetted?
Annotate & Comment: Double-click any text to add a Hypothes.is annotation. No account needed to read; quick signup for a free account to post.
We'd especially like pre-session feedback on
Even if you accept WELLBY as the target unit, the measurement layer forces choices:
underexplained/too brief perhaps
layer
why do we use the word 'layer' -- is this the term in the literature?
close to linear
"Close to linear" is interesting -- a footnote could expand on this a bit
Predictive validity: SWB predicts consequential outcomes systematically
This was mentioned above, but does it do so in a scale-sensitive way?
As I suggested, it's not enough to have it be 'somewhat predictive'
Transformation Sensitivity Demo
This needs more context and explanation. I've forgotten what g of x is here, and what's the actual calculation? Also, this doesn't seem to be illustrating the point that it means to. As I move the slider, population B always seems to be higher, but also it seems like we're getting away from the discussion of the relative impact of different interventions. We don't want to just simply compare populations. If this does pertain to interventions, explain better.
Exokain a bit more (as a footnote) what the 'transformation' means here and why/when it's used
Magnitude-sensitive cost-effectiveness: Even if signs are stable, cost-effectiveness ratios rely on magnitudes
Do they? Magnitudes of what? Explain. Give a 1-2 sentence exampls as a footnote
Incremental WELLBY Estimate
This is simple and perhaps obvious, but good for illustrating the simple WELLBY linear WELLBY concept, but that's already been explained above. I'm not sure what should maybe be put at the top. I'm not sure if it's useful down here. OK put this at the top, in a folding box -- it just helps to make sure we're all in on the same page about the definition of the WELLBY here.
Perhaps it would also be helpful to include some sort of adjusted WELLBY calculator interface that's a more sophisticated concept people might not appreciate, particularly embodying the approach in Benjamin and others.
What "non-identified" means A parameter is "identified" when data + assumptions pin down a unique value. Ordinal responses only tell us which interval a latent value falls into. Many different latent distributions and transformations can generate the same observed category counts, so rankings of means can change across equally admissible representations.
This explanation is not clear. It could be improved, it's a bit too literal. Why do ordinal responses only tell us in which interval a latent value falls into?
This might also be worth folding
2. Definitions and key concepts
add a bit more 'narrative' here
directly via experimental comparison.
comparison of what -- be more precise
Monotonic transformations can reverse conclusions
An example here would be very helpful. ... Perhaps even an interactive display.
Monotonic transformations of what?
Bond and Lang (2019) argue that with ordinal response data, comparing "average happiness" between groups is generally not identified without strong assumptions—monotonic transformations can reverse results.[11]
This should be fleshed out in more detail and rigor, along with some responses to it, and probably belongs earlier on in the discussion.
....
What do you mean, comparing "average happiness between groups is not identified"? What is the thing that is not identified?
Time structure and discounting Later (t>1)Follow-up (t=1)Baseline (t=0)Later (t>1)Follow-up (t=1)Baseline (t=0)#mermaid-1772847441513{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}#mermaid-1772847441513 .error-icon{fill:#552222;}#mermaid-1772847441513 .error-text{fill:#552222;stroke:#552222;}#mermaid-1772847441513 .edge-thickness-normal{stroke-width:2px;}#mermaid-1772847441513 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-1772847441513 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-1772847441513 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-1772847441513 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-1772847441513 .marker{fill:#666;stroke:#666;}#mermaid-1772847441513 .marker.cross{stroke:#666;}#mermaid-1772847441513 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-1772847441513 .actor{stroke:hsl(0, 0%, 83%);fill:#eee;}#mermaid-1772847441513 text.actor>tspan{fill:#333;stroke:none;}#mermaid-1772847441513 .actor-line{stroke:#666;}#mermaid-1772847441513 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-1772847441513 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-1772847441513 #arrowhead path{fill:#333;stroke:#333;}#mermaid-1772847441513 .sequenceNumber{fill:white;}#mermaid-1772847441513 #sequencenumber{fill:#333;}#mermaid-1772847441513 #crosshead path{fill:#333;stroke:#333;}#mermaid-1772847441513 .messageText{fill:#333;stroke:none;}#mermaid-1772847441513 .labelBox{stroke:hsl(0, 0%, 83%);fill:#eee;}#mermaid-1772847441513 .labelText,#mermaid-1772847441513 .labelText>tspan{fill:#333;stroke:none;}#mermaid-1772847441513 .loopText,#mermaid-1772847441513 .loopText>tspan{fill:#333;stroke:none;}#mermaid-1772847441513 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(0, 0%, 83%);fill:hsl(0, 0%, 83%);}#mermaid-1772847441513 .note{stroke:#999;fill:#666;}#mermaid-1772847441513 .noteText,#mermaid-1772847441513 .noteText>tspan{fill:#fff;stroke:none;}#mermaid-1772847441513 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-1772847441513 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-1772847441513 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-1772847441513 .actorPopupMenu{position:absolute;}#mermaid-1772847441513 .actorPopupMenuPanel{position:absolute;fill:#eee;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-1772847441513 .actor-man line{stroke:hsl(0, 0%, 83%);fill:#eee;}#mermaid-1772847441513 .actor-man circle,#mermaid-1772847441513 line{stroke:hsl(0, 0%, 83%);fill:#eee;stroke-width:2px;}#mermaid-1772847441513 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}Persistence, decay, response shift?
This diagram is not fully explained. I don't see how it relates to the rest of the content either.
#mermaid-1772847441491{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}#mermaid-1772847441491 .error-icon{fill:#552222;}#mermaid-1772847441491 .error-text{fill:#552222;stroke:#552222;}#mermaid-1772847441491 .edge-thickness-normal{stroke-width:2px;}#mermaid-1772847441491 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-1772847441491 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-1772847441491 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-1772847441491 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-1772847441491 .marker{fill:#666;stroke:#666;}#mermaid-1772847441491 .marker.cross{stroke:#666;}#mermaid-1772847441491 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-1772847441491 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#mermaid-1772847441491 .cluster-label text{fill:#333;}#mermaid-1772847441491 .cluster-label span,#mermaid-1772847441491 p{color:#333;}#mermaid-1772847441491 .label text,#mermaid-1772847441491 span,#mermaid-1772847441491 p{fill:#000000;color:#000000;}#mermaid-1772847441491 .node rect,#mermaid-1772847441491 .node circle,#mermaid-1772847441491 .node ellipse,#mermaid-1772847441491 .node polygon,#mermaid-1772847441491 .node path{fill:#eee;stroke:#999;stroke-width:1px;}#mermaid-1772847441491 .flowchart-label text{text-anchor:middle;}#mermaid-1772847441491 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-1772847441491 .node .label{text-align:center;}#mermaid-1772847441491 .node.clickable{cursor:pointer;}#mermaid-1772847441491 .arrowheadPath{fill:#333333;}#mermaid-1772847441491 .edgePath .path{stroke:#666;stroke-width:2.0px;}#mermaid-1772847441491 .flowchart-link{stroke:#666;fill:none;}#mermaid-1772847441491 .edgeLabel{background-color:white;text-align:center;}#mermaid-1772847441491 .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#mermaid-1772847441491 .labelBkg{background-color:rgba(255, 255, 255, 0.5);}#mermaid-1772847441491 .cluster rect{fill:hsl(0, 0%, 98.9215686275%);stroke:#707070;stroke-width:1px;}#mermaid-1772847441491 .cluster text{fill:#333;}#mermaid-1772847441491 .cluster span,#mermaid-1772847441491 p{color:#333;}#mermaid-1772847441491 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160, 0%, 93.3333333333%);border:1px solid #707070;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-1772847441491 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#000000;}#mermaid-1772847441491 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}InterventionStudy designMeasured outcomesLS / DALY / depressionTranslation layermapping, calibrationCommon currencyWELLBY / DALY / $Decision
This flow chart is too small and it's underexplained. I don't understand what each of these is meant to mean and how they fit together.
Cheap calibration methods: Can vignettes, anchoring questions, or other calibration approaches work in low-resource settings without excessive respondent burden?
That seems fairly tractable for us to at least share our knowledge about in this conference. Cool.
true mapping
That's the second question combo which we'll be setting up an explainer on. Once we do, we should link that and also link that PQ here
But 'true mapping' Needs a bit more definition. Maybe put it in square quotes to note that (or link the tentative formulation in the PQ space)
Scale-use heterogeneity mapping: How do shifters vs. stretchers vary across LMIC populations, and can we predict which matters more in a given context?
Measuring this seems fairly high value to me if it can be done at a low cost.
These questions represent high-value areas for future research that could meaningfully improve the reliability of WELLBY-based comparisons:
I wouldn't state this so directly and clearly, and give attributions to people making the claims that these represent high value. We want this to be one of the outputs of the workshop, but I'm not sure that all of these are in fact high value. Some of them might be very much intractable.
Within-person designs where each person serves as their own control
But this can bring its own problematic effects if people feel prompted or motivated to report an improvement to please the experimenters, etc.
Treat WELLBY estimates as one input among several, not the final answer
That's the sort of milquetoast thing I want to avoid. People will always say, "Do compare multiple things, don't treat something as the gospel truth, etc." It's not a statement with a lot of meaning.
8. Practical Recommendations
I don't like the core practical recommendations for having a section here. The recommendations are meant to come out of the workshop. We shouldn't be pre-establishing them. It's OK if you want to compare the recommendations coming out of the existing reports & literature, though.
Effect size (ΔLS points)
define this
Use mapping models (depression → LS), but carry mapping uncertainty explicitly
what is a 'mapping model'? UNderexplained
life satisfaction
AKA Wellby? But this is not practical for comparing interventions with different measurements
mental health weights can be contentious
give a bit more on how these do mental health weights
Multi-item SWB scales
citation and detail (footnote?) needed here
Reduces scale-use bias 30-50%
citation and explanation needed
DALYs and QALYs: Standardized But Narrower
How are these measured in the relevant settings and how does it differ from WELLBY? These are based on external measurements?
DALY / QALY
Searate QALY from DALY maybe, they are distinct in key ways
Years of Life Lost (YLL) + Years Lived with Disability (YLD
this seems like it must be incorrect/imprecise. Is a year with a disability actually measured here as being as bad as a year of life lost? This needs a better definition ... how is it measured
It does not automatically imply that within-study randomized treatment effects are meaningless It implies you should be explicit about what assumptions let you treat reported changes as welfare units
this seems a bit babytalk/obvious
OECD (2024) concludes data remain meaningful for policy despite critiques
Give a link... and what is the basis for this? Meaningful is somewhat of a vague term. It doesn't get at the hard questions about what measures we should use for comparing specific interventions.
Survey response times can help solve identification (Liu & Netzer, AER 2023)
This is highly counter-intuitive to me. How do survey response times help?
latent distribution
Latent distribution of what?
Strong assumptions needed to treat "0.3 points" as "0.3 welfare units"
Didn't we already get at this above? What's new about this?
mean comparisons non-identified
What is meant by "non-identified" here?
A strong response to skepticism: even if the numbers seem arbitrary, do they behave like a measurement? Kaiser and Oswald show that single numeric feelings responses have strong predictive power—relationships to later "get-me-out-of-here" actions (changing neighborhoods, jobs, partners) tend to be replicable and close to linear in large longitudinal datasets.[10]
This kind of seems like a weak response unless I'm missing something. Even if they are not arbitrary, even if they have informational value, it doesn't tell me that they provide reliable information in comparing the benefit/cost across multiple interventions which all improve people's lives.
They do not solve cross-study comparability—but demonstrate that in at least one setting, SWB is responsive.
But this doesn't seem to have been the challenge as posed. I'm not sure this is the most relevant thing to lead with, or maybe it needs to be motivated better
Noise inflates uncertainty—credible estimates need adequate sample sizes and good designs
this seems a bit obvious?
Measurement error attenuates estimated effects (bias toward zero)—small real effects may be undervalued
How does that affect the relative comparison of interventions?
is meaningful
'is meaningful' requires clarification
LS misses physical suffering or other channels.
Why would that be the case? Who is making this claim?
What breaks: Duration weighting is wrong. Why it might fail: Adaptation effects—people return to baseline. Mitigation: Long-term follow-up data.
Again, this is too shorthand. I need an explanation, if necessary, in footnotes or a folding box, of what all this means.
ΔLS has ≈ same welfare meaning across people
'meaning' should be clarified, perhaps with reference to the gold standards I suggest you add above. Should we state this in terms of an individual's willing to make "time trade-offs" (e.g., would be willing to go from 7-->6 for one year in exchange for going from 3-->4 another year), or probability trade-off (would take a coin flip over the above ), or person trade-off (a third party willing to move one person from 7 to 6 it meant moving someone else from three to four) ... [or vice versa in all cases]
ΔU(3→4) = ΔU(7→8)
Obviously this notation is extremely crude! I wonder if important nuance is lost here
E.g.,. is this 'within person' or 'across people'?
Validity
"Validity" is vague, needs a better definition. And perhaps something more informative in terms of the metric offering value would help. Naturally, no metric would be perfect, and even if a model's assumption are violated in practice, the assumption might be close enough to holding that the difference doesn't matter much.
We need a better definition of the 'gold standard here'. What would an 'accurate comparison' tell us? What is the appropriate measure of 'degree of inaccuracy'?
Test
how to test this? Define 'log transformation' more clearly here and what are the assumptions necessary for it to accurately reflect tradeoffs?
Summing is invalid;
What's meant by 'summing' and what does 'invalid' mean here?
Ceiling/floor effects: Even with identical reporting functions, bounded scales can cause mechanical differences in responsiveness at high or low baselines.
But this does not seem consistent. You are saying "when heterogeneity is most dangerous", but this doesn't look like heterogeneity.
Comparing across studies/countries: Different instruments, translations, norms, and populations. If the distribution of stretch factors bi differs, "1 point-year" is not the same welfare unit across the evidence base.
Can you justify this a bit more, both in equations and in an intuitive explanation of what the problem is?
instrument
what's meant by 'instrument' here?
interpersonal noncomparability is less of a threat for estimating an average treatment effect
"less of a threat" is vague, needs clarification. And why? Give a citation and/or a proof and further explanation (perhaps in a footnote)
reporting functions
define 'reporting functions' (footnote or folding box, linked here)
studies, countries, or populations with different distributions of "stretch factors.
Adapt this discussion to focus more on comparing different interventions (see the canonical example but also link real-world relevant comparisons and studies) ... where these interventions may take place in nearly-identical, similar or distinct contexts, affect similar or different outcomes (wealth health, etc.)
Δui = bi × ΔLSi.
this needs more explanation. What does 'fail' mean here? What's being compared, and how do the estimates compare with the ground truth?
ai
notation not rendereing well here
UA ≈ UB
Maybe add a footnote explaining what sort of "utility" we are considering here, noting this is a bit of an oversimplification of welfare considerations.
Incremental
Is 'incremental wellby' a term used in any literature or practice? Give a link
A common overstatement is that
Who stated this? How is it 'common'? Maybe just change this to "Equal scores mean equal welfare" is stronger than most applications need.
comparisons
"Comparisons involving mortality" #implement
This second form requires a defined zero point (e.g., death = 0)
Might benefit from some further explanation. How could Level-based be used for comparing interventions -- that's not clear here. How many people are we summing over? How do 'dead people' enter into that? Some explanations can go in footnotes.
Σi Σt δt (LSit(k) − LSit(0))
Is this really How it's depicted in the literature? It's a bt confusing at first, because it looks liek one has to know two things for incremental WELLBYs and only one thing for the Level based measure. Furthermore, the incremental one seems to requre knowledge of a counterfactual. However, one mght be able to have an estimate of a difference without knowing the levels. Isn't there a better notation/explanation for this?
ΔWELLBY(k) = Σi Σt δt (LSit(k) − LSit(0))
I'm missing the definition of the indices i and t, as well as the definition of the variable LS -- #adjust #implement
Benjamin et al. 2023, UK Green Book Wellbeing Guidance, Bond & Lang 2019, Haushofer & Shapiro 2016/2018, Kaiser & Oswald 2022)
Are these Really all the sources? I thought we had more.
AI-Generated Content (March 2025): This page was created through iterative prompting of Claude Code (Opus 4.5) and GPT-5.2 Pro, feeding in workshop discussion content and focal papers for our Pivotal Questions initiative (Benjamin et al. 2023, UK Green Book Wellbeing Guidance, Bond & Lang 2019, Haushofer & Shapiro 2016/2018, Kaiser & Oswald 2022). While grounded in these sources, this content requires further human verification. Specific claims, citations, and numerical details should be checked against the original literature before relying on them.
Make this a folding box #implement
We may quote specific responses with attribution unless you request otherwise. If you prefer your responses remain anonymous,
Adjust this -- "If you prefer your response to remain anonymous, please use a pseudonym and try to use the same one consistently if you're providing multiple responses." If you are fine with internal recognition but don't want any public attribution, please let us know and share any other concerns in the field at the bottom.
How likely is it that the simple WELLBY measure (as defined above) is the best or near-best measure—yielding no less than 80% of the value of the best measure—for cross-intervention comparison in the focal context? (State your best calibrated probability.)
I'm considering adjusting this one to
Consider the 'value obtained when using the best feasible measure for cross intervention comparison in contexts like the focal context'. What share of this value is obtained, in expectation, from using the simple linear WELLBY measure for all interventions? Please give your central belief, and 90% credible intervals"
-- with a slider that goes from zero to one, and two other sliders that allow that allow you to specify the lower and upper bound of the 90% CIs.
emonstrates that small transformations can reverse published findings.
NotebookLM:
"they applied their methodology to nine prominent results from the happiness literature—including the Easterlin Paradox, the U-shape of happiness in age, the ranking of countries by happiness, and the effects of marriage and children—and showed that the standard conclusions in all nine areas could be reversed using monotonic (specifically lognormal) scale transformations. They argued that these reversing transformations were "plausible," claiming they were no more skewed than the U.S. wealth distribution
However, later work questions the plausibility of this. .
A systematic conceptual defense of treating wellbeing scales as cardinal and comparable. Argues that deviations from cardinality are small and not policy-relevant.
See https://www.econstor.eu/bitstream/10419/232657/1/dp13905.pdf
Start here (30-60 min total)
It will take more than 60 minutes, but a skim might be doable in 60 minutes.
Confirmed Participants (15) — click to expand
Add "(rsvps)" -- Also update this, I think we have over 20 by now.
Note: human means carry their own variance; correlations here are bounded by human inter-rater noise.
is this ggplotly? Shouldn't it be dynamic? I don't seem to be able to adjust it
Table 3.1: Token usage and estimated cost per model
These are probably not updated costs -- Update and implement
Alberto Prati may contribute via pre-recorded video.
Not 'video', possibly some written content, or we can extract issues from his evaluation to ask Benjamin et al.
leads to least regret?
The "least regret" is a formal term in information theory, I believe, or from Bayesian updating. Provide a footnote defining and referencing it. #Implement
Annotate & Comment:
We'd especially like pre-session feedback on
ng "average happiness"
But we're focused on interventions, not on average happiness between groups. - Be more specific to this. #Implement
The key critique
--> "A key critique"
biasing
'potentially biasing'
not a corner case.
what's meant by 'corner case' here?
MH
what's MK? I forgot. Has it been defined?
Most studies measure outcomes at baseline and one or two follow-ups;
Give a footnote with some examples here. What do the studies involving LMIC interventions do?
cancels
'cancels' is vague -- explain what is meant here in a footnote. #implement
The measurement-to-decision pipeline #mermaid-1772846605552{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}#mermaid-1772846605552 .error-icon{fill:#552222;}#mermaid-1772846605552 .error-text{fill:#552222;stroke:#552222;}#mermaid-1772846605552 .edge-thickness-normal{stroke-width:2px;}#mermaid-1772846605552 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-1772846605552 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-1772846605552 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-1772846605552 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-1772846605552 .marker{fill:#666;stroke:#666;}#mermaid-1772846605552 .marker.cross{stroke:#666;}#mermaid-1772846605552 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-1772846605552 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#mermaid-1772846605552 .cluster-label text{fill:#333;}#mermaid-1772846605552 .cluster-label span,#mermaid-1772846605552 p{color:#333;}#mermaid-1772846605552 .label text,#mermaid-1772846605552 span,#mermaid-1772846605552 p{fill:#000000;color:#000000;}#mermaid-1772846605552 .node rect,#mermaid-1772846605552 .node circle,#mermaid-1772846605552 .node ellipse,#mermaid-1772846605552 .node polygon,#mermaid-1772846605552 .node path{fill:#eee;stroke:#999;stroke-width:1px;}#mermaid-1772846605552 .flowchart-label text{text-anchor:middle;}#mermaid-1772846605552 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-1772846605552 .node .label{text-align:center;}#mermaid-1772846605552 .node.clickable{cursor:pointer;}#mermaid-1772846605552 .arrowheadPath{fill:#333333;}#mermaid-1772846605552 .edgePath .path{stroke:#666;stroke-width:2.0px;}#mermaid-1772846605552 .flowchart-link{stroke:#666;fill:none;}#mermaid-1772846605552 .edgeLabel{background-color:white;text-align:center;}#mermaid-1772846605552 .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#mermaid-1772846605552 .labelBkg{background-color:rgba(255, 255, 255, 0.5);}#mermaid-1772846605552 .cluster rect{fill:hsl(0, 0%, 98.9215686275%);stroke:#707070;stroke-width:1px;}#mermaid-1772846605552 .cluster text{fill:#333;}#mermaid-1772846605552 .cluster span,#mermaid-1772846605552 p{color:#333;}#mermaid-1772846605552 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160, 0%, 93.3333333333%);border:1px solid #707070;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-1772846605552 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#000000;}#mermaid-1772846605552 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}
the diagram is too small, and was never explained!
Some influential critiques argue that different monotone transformations can reverse conclusions about "average happiness"
'influential' -- that's subjective. ///Link to an example
Is "incremental WELLBY" standard terminology? Some literatures talk about WELLBYs as point-years of life satisfaction (UK guidance) and many evaluation contexts are inherently incremental. But "incremental WELLBY" itself is not uniformly a standard term. In this page, we use it as a descriptive label for counterfactual impact calculation, not as established jargon.
too inside-info for a whole box. -- make this a footnote at most
WELLBY (unit of account): UK Green Book guidance defines a WELLBY as a one-point change in life satisfaction on a 0-10 scale, per person per year.[3]HM Treasury (2021/2024). Wellbeing Guidance for Appraisal: Supplementary Green Book Guidance.
Missing the standard framing of the LS question here
many workshop annotations objected to undefined symbols and unclear indexing.
too much info -- don't say this here, this is process stuff
The measurement-to-decision pipeline #mermaid-1772845759179{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}#mermaid-1772845759179 .error-icon{fill:#552222;}#mermaid-1772845759179 .error-text{fill:#552222;stroke:#552222;}#mermaid-1772845759179 .edge-thickness-normal{stroke-width:2px;}#mermaid-1772845759179 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-1772845759179 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-1772845759179 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-1772845759179 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-1772845759179 .marker{fill:#666;stroke:#666;}#mermaid-1772845759179 .marker.cross{stroke:#666;}#mermaid-1772845759179 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-1772845759179 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#mermaid-1772845759179 .cluster-label text{fill:#333;}#mermaid-1772845759179 .cluster-label span,#mermaid-1772845759179 p{color:#333;}#mermaid-1772845759179 .label text,#mermaid-1772845759179 span,#mermaid-1772845759179 p{fill:#000000;color:#000000;}#mermaid-1772845759179 .node rect,#mermaid-1772845759179 .node circle,#mermaid-1772845759179 .node ellipse,#mermaid-1772845759179 .node polygon,#mermaid-1772845759179 .node path{fill:#eee;stroke:#999;stroke-width:1px;}#mermaid-1772845759179 .flowchart-label text{text-anchor:middle;}#mermaid-1772845759179 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-1772845759179 .node .label{text-align:center;}#mermaid-1772845759179 .node.clickable{cursor:pointer;}#mermaid-1772845759179 .arrowheadPath{fill:#333333;}#mermaid-1772845759179 .edgePath .path{stroke:#666;stroke-width:2.0px;}#mermaid-1772845759179 .flowchart-link{stroke:#666;fill:none;}#mermaid-1772845759179 .edgeLabel{background-color:white;text-align:center;}#mermaid-1772845759179 .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#mermaid-1772845759179 .labelBkg{background-color:rgba(255, 255, 255, 0.5);}#mermaid-1772845759179 .cluster rect{fill:hsl(0, 0%, 98.9215686275%);stroke:#707070;stroke-width:1px;}#mermaid-1772845759179 .cluster text{fill:#333;}#mermaid-1772845759179 .cluster span,#mermaid-1772845759179 p{color:#333;}#mermaid-1772845759179 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160, 0%, 93.3333333333%);border:1px solid #707070;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-1772845759179 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#000000;}#mermaid-1772845759179 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}InterventionStudy designMeasured outcomesLS / DALY / depression scaleTranslation layermapping, calibration, assumptionsCommon currencyWELLBY / DALY / $Decision / deliberation
this is too small and also underexplained
interventions
Especially in LMICs
This workshop's focal question is not "which countries are happier.
this doesn't need bold
Plant, M. (2025). "A Happy Possibility: Rational Behavior and the Cardinality Thesis." Working paper.
wait -- hallucination -- you renamed the title here!!
entire
remove 'entire'
f you compare to mortality-preventing interventions
Adjust this to "if you compare interventions that affect mortality (or, in some accounting, birth rates)"
ΔWELLBY(k) = Σi Σt δt (LSit(k) − LSit(0))
LS is not defined here, nor is 'i'
Level-based WELLBYs (for mortality comparisons):
Or 'for interventions that change mortality rates' perhaps?
📊 View Aggregated Results See beliefs elicitation summaries and Metaculus question forecasts
I don't think I want to show this here because I don't want people to anchor in stating their beliefs. #todo #adjust #implement
html`<div style="background: #f8f9fa; padding: 1rem 1.25rem; border-left: 4px solid #3498db; margin-bottom: 1.5rem; font-size: 0.95em; line-height: 1.6;"> <strong>What these numbers represent:</strong> Simulated <strong>production cost per kilogram of cultured chicken</strong> (wet weight, unprocessed) in <strong>${targetYear}</strong>, based on ${stats.n.toLocaleString()} Monte Carlo simulations. This is the cost to produce meat in a bioreactor — not retail price, which would include processing, distribution, and margins. <br><br> <strong>Why it matters:</strong> If production costs reach <strong>~$10/kg</strong> (comparable to conventional chicken), cultured meat could compete at scale. If costs remain <strong>>$50/kg</strong>, the technology may remain niche. These thresholds inform whether animal welfare interventions should prioritize supporting this industry. </div>` RuntimeError: targetYear is not definedOJS Runtime Error (line 804, column 163) targetYear is not defined
How Can we fix this 'runtime error'.I think it was working before. The "target year" should be the "projection year" in the sidebar model parameters. The default year was 2036. #implement