About Summary Pivotal Questions Live Sessions Resources ▾ Readings Linear WELLBY Analysis DALY-WELLBY Conversion Metaculus Question
colore on black text here is very hard to read
About Summary Pivotal Questions Live Sessions Resources ▾ Readings Linear WELLBY Analysis DALY-WELLBY Conversion Metaculus Question
colore on black text here is very hard to read
Derrick Risner
Would not attend live but offered to follow up asynchronously.
Session discussion interest — from participant forms
Fix this. It takes too long to generate.
PQ3B — Share of major funders using WELLBY in 5 years
how many use it now -- put this in context
How many WELLBYs equal 1 DALY?
check and annotate -- what does e.g., 5 wellbys per DALY mean in context, and how does it compare with what people currently do?
problem under consideration. So I'd resist doing a simple exchange rate."
This seems like a valid objection, but I think we still phrase the question such that you would give a meaningful answer, or you could give a meaningful answer in this case in terms of the value generated if you were forced to use a single conversion.
PQ1B — Recommended measure for funders
The discussion might be more valuable, or I would say is likely to be more valuable than the response, particularly for this question.
Composite well-being measure
let's do better to differentiate this from calibrated well-being. It's not fully clear to someone glancing at this briefly.
PQ2 — DALY / WELLBY conversion factor
this is not a 'subquestion' ... label this page header differently
Wellbeing Workshop · Beliefs Analysis
Add backlink to the workshop and to the elicitation form here.
How many WELLBYs equal 1 DALY?
Make sure you can access the literal question from this interface to know exactly what the respondents are answering. If these things get very long, you can use tooltips.
Modeled disbursement by end-2026
some visulaizations (bar or donut or mosaic graph) would be helpful here
Interactive uncertainty model
I don't htink this is a stochastic model? Perhaps an extension of this should give these (correlated?) distributions. ... Squiggle-type modeling
There should also be discplay of the actual equation behind the model, and a folding box or linked page explaining it in more detail
BOTEC model
Maybe put this model at the top, after a short preamble?
Prediction market
any prediction markets deal with the fundraising/donation/nonprofit askpets of this?
. Independent and politically diverse funding remains valuable even in a high-liquidity world.
why? is this just a truism?
Organizations should distinguish runway decisions from upside options. If a project is valuable only under a fast-funding scenario, that dependence should be explicit rather than hidden inside local rumor.Funders and field builders should prioritize grantmaker capacity, plural donor relationships, legal vehicles, and evaluation infrastructure. These are the bottlenecks that convert paper wealth into usable grants.
this advice seems on the overly generic side?
Grantmaker capacity multiplier
what is this -- things like this need definitions, in the text with a link and in tooltips
rch memoWill AI
title font too big, taking upd too much screen saace
Will AI Wealth Actually Flood AIS/EA Philanthropy Soon?
I'll try to respond / adapt to hypothes.is comments, especially if you flag @daaronr
public data
Unjournal data
Logic map:
There's some text overlap in the diagram.
e possible costs of negative or mixed findings.
what 'costs of negative findings'? This is a bit of a vague sentence.
Request evaluation when:
also state and interpret the equation in terms of 'benefits vs cost' -- easier to interpret I think
+,- is
don't use "+" and "-" as variables ... we can do better
The Unjournal evaluates research; it does not publish papers as journal articles, and evaluators do not issue accept/reject decisions. This matters because many author co
links don't need to be bolded
Logic map: where public evaluation changes the author's payoff
overlapping text in diagram below
or policy relevance.
remove 'or policy relevance' perhaps -- The Unjournal prioritizes research with global impact potential (although that's not what we mainly rate the research on)
or the likely criticism is about taste, importance, novelty, or fit rather than checkable claims
not sure I understand the logic behind the latter part
or the likely criticism is correct
how would someone know this?
Author-Facing Guide
This is a possible set of considerations -- I don't want to state it as definitive
reader s
maybe the 'gatekeeper' or 'editor' instead of the 'reader'?
public signal is Y in +,-,
Is it a truly 'public' signal, or only one the evluator sees after reading the paper? Or should we have both?
r[qA - (1-q)L] + (1-r)[(1-q)A - qL] - k ≥ 0
notation needs improvement, and it should be explained more -- how derived, how to interpret it? Tooltips and expanding sections could help
Requesting a noisy public test is not the same as disclosing an already-known verifiable fact.
I suspect another paper has dealt with this question ... 'when noisy signals help the seller' or some such
e reader should not update from p0 after observing the evaluation res
this needs clarification, I don't quite see why this is the case. Isn't it possible that the author's signal is positive so they submit, but the evaluator reading the paper gets a negative signal?
he relevant prior
relevant to whom?
he reader acts when posterior belief exceeds threshold c.
what is 'acts' and what is 'c' -- underexplained
binary quality Q in {H,L}, w
is this 'binary' rhe relevant threshold? Where did it com from? is it sort of generalizable? Consider if it misses some important nuance
tive, the reader
who is the 'reader'? I think we mean the later journal evaluator, or career gatekeeper
Main
base
Single
simple
empirically import
why?
Public anonymity statistic. The anonymity choice is empirically important. Running python unjournal_anonymity_stats.py on the public data bundle gives 65 anonymous/generic public evaluator identifiers among 113 deduplicated evaluator-paper pairs with quantitative ratings: 57.5%. In the subset matched to published-evaluation status, the share is 63/105: 60.0%, with 7 unmatched title rows. This supports saying "a bit over half" choose anonymous/generic public identifiers, but the denominator should be stated. The wider evaluator_paper_level.csv denominator is not clean for this claim because survey-only rows are assigned generic Evaluator N labels.
this should be a fold or footnote -- give a quick statistic and footnote yow it was captured
are the right default
too strong. And takes the 'high dowside' idea for granted too much IMO
When the answer is unclear, the practical move is not immediate publicity. It is a fit-and-timing conversation, coauthor consen
too much 'not this but that' AI speak. And 'publicity' is vague here
downstream readers
more like 'editors and peer reviewers'
bout conditional embargo.
too much use of bold within paragraphs and prose
After the author requests evaluation, update from p_D, not the raw prior p0:p+ = p_D q / [p_D q + (1-p_D)(1-q)] p- = p_D(1-q) / [p_D(1-q) + (1-p_D)q]
the latex/math is not rendering as well as before we moved this to Codex for editing. Can we recover the better formats and get the best of both worlds?
2. Is the main obstacle credibility, visibility, field fit, or network access?
this needs further explanation and clarification -- 'usual channels' should already encompass clarification
e relevant question is not whether public evaluation is always good; it is when a public signal improves expected outcomes relative to waiting, revising privately, or continuing th
this is the 'AI language of dichotomy' overused
— m
You're using em dashes too much.
a result that a credible public test strictly helps authors whose default standing sits below the bar — and we are precise about the downside it carries for those just above it;
The language of this is a bit unclear. Try to make it easier to understand.
(1) a
Don't use bold here.
r to a public-evaluation venue that pays expert evaluators
I'm not sure if the fact that evaluators are paid here is relevant to this question - rather than giving these details, you could just say "The Unjournal is the focal example"
Explicit crux Which specific uncertainties — AGI timing, takeoff speed, power-seeking tendency, offense-defense balance, pause feasibility — most shift expert p(doom) estimates?Community solicitation for explicit AI-risk cruxes: uncertainties whose resolution would significantly shift p(doom), including AGI arrival year, takeoff speed, power-seekin
this is meta -- I don't want meta, or at least put that into an 'opt-in' list
ee our early automated prioritization prototype, which is outside legal research and currently focuses mainly on economics and related work.
We can swap in here the legal prioritization prototype -- https://uj-prioritization-prototype.netlify.app/legal/ -- please do this -- and note that we're looking for feedback and examples to help improve and train this. Note that we don't envision this prioritization to be mainly driven by AI models -- humans will be making the ultimate decisions -- but these tools can be very helpful in the process.
Comment directly on this page using the Hypothes.is sidebar (the < tab on the right edge). Or use the rating buttons on each paper card — human ratings are how we will calibrate these scores.
Give people the option to suggest/add content.
Show legal scoring rubric & themes
Show/hide
Legal scholarship spinoff
Proposed and in consideration.
Comment directly on this page using the Hypothes.is sidebar (the < tab on the right edge). Or use the rating buttons on each paper card — human ratings are how we will calibrate these scores.
Let us know if you have any questions about this.
Elliot Swartz on the gap between academic analysis and industry learning.
this is not Swartz! adjust!
How this was made. Drafted by GPT Pro from existing Unjournal research and discussion (the elasticity-validation survey, the Bray et al. evaluation materials, and the PBM substitution literature), then built and polished into this interactive report in Claude Code. It is currently being reviewed and adjusted by hand. Treat figures and attributions as provisional until that review is complete; the governing evaluation lives on PubPub.
Make this a folding box - and the header should say AI/human collaboration in some way
Another folding box should have the standard call out about how we want feedback, and you can use the hypothesis tool for that.
Note: This workshop is in early planning. The framing, evidence base, and participant list are still being developed.
Still considering how to frame this workshop, and it depends on interest and participation. One frame is directly targeting what we know about plant-based products, who consumes them, and what it suggests for potential substitution and animal welfare. However, that evidence seems to be rather thin, inconclusive, and premature, perhaps. (See links to EA forum posts, etc.) Furthermore, our evaluation of Bray et al. on experimental versus standard quantitative marketing/I.O. estimates of own price elasticities suggests perhaps deep uncertainty. and lack of ability to be confident in these parameters, not to mention cross-price effects and substitution patterns. This potentially motivates a pivot towards focusing on these methodological questions, as well as framing it in terms of "what can we know and what research is worth pursuing."
Plant-Based Substitution Workshop · May 2026
Change the date here. The date is still undetermined, probably late summer 2026.
Thank you for participating in The Unjournal's Plant-Based Substitution Pivotal Questions workshop. Your feedback helps us measure the workshop's impact and improve future workshops.
Remove this page for now because it makes it seem like the workshop already happened.
A major methodological innovation. The framework is elegant and the estimation strategy is sound. The empirical component would especially benefit from more diverse and reliable samples, and from direct comparisons against existing scale-correction methods so readers can judge incremental value. Logic and communication could be tightened in places — rated lower here than the other dimensions.
This is not his full evaluation. He gave a very in-depth evaluation, and you've only taken one paragraph here.
The cost of calibration questions The central tension is practical, not theoretical. Prati flags that the evidence rests on a large number of calibration questions. It is unclear how well the correction performs with the realistic two or three CQs — and even two can be a heavy burden in large surveys. He suspects this is “one crucial reason anchoring vignettes have not been implemented at scale in 20 years.” Kaiser rates the work highly but pushes for more diverse, reliable samples and direct comparisons against existing scale-correction methods, so readers can judge the incremental value. His lower marks fall on logic & communication and on claims & evidence.
Scene 1 of 6
Why "scene"? Are we making a video? Does it even need numbering
Claims & evidence
these tooltips are not coming up after seconds of hover. -- I only see the "?" -- please fix
Two experts, eight criteria
We probably want a little bit of a transition here between the issue and the issue of measuring individuals' well-being through self-reports and what the Unjournal is now doing in terms of rating the paper, which is also on certain scales that may have subjective components themselves. Funnily enough. Make the distinction clearer here
For decades, economists hesitated to use subjective well-being data for one stubborn reason: people use survey scales differently.
This probably needs a little bit more context on why we're trying to measure people's well-being and happiness through self-reports.
Estimated from a few extra calibration questions — not a full vignette battery.
the diagram is not fully explained? what does each dot represent? Should we be giving 'names of people' (or IDs, or types of people) to make that clearer?
Same underlying well-being
Make it clear that the number line here is supposed to represent some fixed measure of true well-being.
data for one stubborn reason:
I know this is meant for a public audience, but it's a little bit oversimplified. Perhaps we can say it in an equally concise and appealing way, but without making the absolute claims like "for one stubborn reason..." there may have been other reasons too. (Note to AI -- try to make this a persistent pattern in your writing. )
separate pages,
Individual "pubs" in the Unjournal PubPub interface
Start here: four real evaluations, one page each
Make it clear that this is what we're calling the "hub" layout.
Tell me what you think
this is taking up a lot of space -- make it folded by default
evaluation
--> evaluation package
A whole evaluation on one page
--> evaluation package on one page
What's in the paper Reconstructed from what the evaluation and author response cite — not the paper's own table of contents. Sections / figures below are only those referenced on this page.
let's use actual content and structure from the paper!
I believe that we have not been sufficiently cautious when taking bets that could be causing significant direct harm to animals (beyond just the lost funding that could be spent elsewhere).
This makes me think you're looking at this from a "deontological" standpoint.
But taking such bets is only appropriate if the risk of causing harm is sufficiently small.
In one sense, that's obviously true, as if the risk of causing harm is high enough, the expected value goes negative.
But if you're saying "We should not make even known positive expected value bets if the downside risk is too large"that's a judgment call, and it depends on your moral/ethical worldview.
Research is expensive and slow, especially at universities. But we're about to have the luxury to aim higher.
I'd like to see ambitious research initiatives independent from traditional university/journal processes. If we have the funding, we can build these fields.
designing experimental plansconducting the studiesanalysing the raw data
I don't fully agree that we need "EA AW community" hands-on involvement in the intermediate and technical steps.
I think it's more a matter of providing funding and incentives and clearly communicating the goals, priorities, and need for rigor to researchers, at enabling coordination.
But I agree that academic incentives on their own are not enough to ensure high-quality, credible work focused on animal welfare implications.
And it would indeed help if the researchers intrinsically cared about animal welfare and thus about producing useful and accurate results. This makes the incentive alignment easier, but I think it can still work even if much of the work is done by people who aren't intrinsically interested in animal welfare or don't think about effectiveness in the same way - as long as they can understand and embody the priorities in their work.
I think entire organisations could and should be founded for this. Until now, this was simply not possible. Research is expensive and slow, especially at universities. But we're about to have the luxury to aim higher.
At The Unjournal, we are trying to bring together researchers, practitioners, and funders to do this sort of prioritization and coordination to be able to generate, communicate, powerful, useful evidence, robustly assessing its credibility and improving. Something we're trying to implement through our pivotal questions and workshops, e.g., https://uj-cm-workshop.netlify.app/summary ... and I think we're having some successes.
But I'm not saying we're necessarily the best positioned to do this, and I'd love to work with others or see others move forward on this.
But simply funding the broad field of animal welfare science is likely to create scattered research results that are difficult to translate into action.
I agree. I strongly believe that some coordination is necessary, and we shouldn't just rely on academic incentives. Large-scale, ambitious evidence and collaborations seem high-value to me.
I do not think that we can consider it an evidence-based intervention by EA standards.
I would say that this depends on whether you're willing to rely on what might be considered fairly "common sense" priors about the substitution effect in an environment where we have little evidence and it's very hard to collect reliable evidence. Perhaps for any intervention there will be some aspect of the model that we might need to take for granted as just a common sense implication.
I can appreciate that you might not agree that the price and taste equivalent plant-based meat would substantially crowd out the consumption of conventional meat. I find it harder to imagine that the same would hold for cell-based meat, but this is obviously a judgment call.
That said, there's a bit of a chicken-and-egg problem here in that I think it's plausible it's reasonable to assume that until cell-based meat is actually in restaurants and supermarket shelves, we won't have a good sense of whether people will buy it as a replacement for conventional animal meat. But the only way for it to actually get on the shelves is by there being substantial investment. So requiring this high degree of certainty might make it impossible for us to consider potentially high-value but risky investments in this sort of innovation.
See also this[20] more recent meta-analysis that came to a similar conclusion about alternative proteins and other meat reduction interventions.
The evidence is, in fact, all over the map. -- see e.g., the survey here -- https://forum.effectivealtruism.org/posts/3Eh8MbqLwFBsD7GK2/how-much-do-plant-based-products-substitute-for-animal#Existing_Research
But there are also doubts about whether we can reliably collect evidence in this domain. See https://uj-pba-workshop.netlify.app/context/pbm_fuller_report/ for an (AI-aggregated) synthesis ... Bray et al cast doubt on the reliability of even own-price-elasticity estimates in fairly standard settings -- see our evaluation of this paper here -- https://unjournal.pubpub.org/pub/evalsumbraybray/, and we're working on further discussion about the implications of this for substitution.
(FWIW, intuitively, I have a strong prior that there would be a fairly strong substitution, even if not one that would completely end animal farming. As Bayesians, our prior beliefs should count for a lot in a domain without substantial evidence.)
A year ago, CG had spent over $34 million total on grants in this space[13].
And they are funding more -- see their most recent post.
WFI itself has highlighted a general lack of research in poultry welfare[18].
Confirmed. There does seem to be deep uncertainty here substantial lack of measurement
heavily dependent on what harms are included, how they are scored, and how different types of pain are weighed.
I'd like to see more justification of this bolded claim that it's indeed very sensitive to the assumptions. Can you provide a link or a footnote about this? Are there reasonable specifications under which it goes the other way?
Only one study (STA_16) found that furnished cages had a higher mortality than single tier aviaries.
The USA and WOR ones would seem to also, although the difference may not attain statistical significance in a conventional sense. But it still contributes to the evidence in that direction.
I would still admit that the evidence, as you presented it, seems to go the other way, though.
Where errors weren't available, I made a note summarising the difference.
This table is somewhat hard to read and somewhat hard to see a synthesis of. I think something like a forest plot could be helpful here. And of course, it would be helpful to just report on the actual meta-analytical results.
What seems important to me is the expected magnitude of the difference in mortality between systems and the implications for the difference in suffering, factoring in the other differences in life quality and health.
n this data set (USA_13), mortality (cumulative at 60 weeks) is indeed statistically indistinguishable from cages.
In fact, in this case, the aviary mortality is actually lower than conventional cage mortality.
possible pair-wise comparison for statistical significance using z-scores where standard errors were reported.
Should correct for multiple comparisons, we'll stop. At the same time, I disagree with drawing conclusions from a lack of statistical significance. To make a conclusion about "it's very unlikely that there's any difference large enough to be meaningful", you need to do something like an equivalence test, or report Bayesian posteriors over the difference.
and double arcsine transformed).
What is the double arc sign transformation here, and why is it used? AI please remind us
These aren't sudden, painless deaths. Increased vent pecking itself is also a sign of increased environmental stress. Overall, this suggests that hens in the cage-free systems generally experienced more distress.
This part certainly suggests that, but we should really present the net with some adding up and weighting for the magnitude of the suffering of each event.
Hens in the cage-free system performed the most natural behaviours (flying, perching, dust bathing, foraging) and had stronger leg and wing bones. However, the study also found that cage-free systems hadmore severe foot lesionsmore keel abnormalitiesincreased aggressionincreased mortalityThe mortality in cage-free systems was over twice as high as the others:
These things and their consequent animal suffering/animal welfare burden could presumably be weighted and aggregated and compared. "More" could be slightly more in one case and far more in another case. What are the aggregate differences for reasonable assumptions here?
Additionally, both studies implanted conductive electrodes in the test animals. It is plausible that this significantly affects how current flows through the shrimp's body.I also feel confused about what a signal from an electrode on a heart or a ganglion actually tells us. The plots of the recorded “power” are hard to interpret without a control signal to assess what the noise floor is.
AI -- please look this up/clarify
n conclusion, evidence for electrical stunning is extremely limited and we shouldn't feel comfortable recommending anything with confidence.
I guess that's my take on what you shared too, but how do authors and experts in the field (other than you) interpret it? A bit of steelmanning+feedback could be useful imo.
AI: look this up, including in comments below, provide sources for other research.
Overall, there are only two scientific studies on the topic of using electric shock on Whiteleg shrimp. Both have limited sample sizes. Both show some recovery from electric shock. Both find that immersion in proper ice slurry leads to a rapid drop in vital signs. Neither is representative of industrial stunning machines[11].
if you want to rewrite or rework a bit, maybe lead with this, so we can understand what the evidence is getting at.
At lower shock voltage and duration, neural activity decreased on average, but sometimes increase
ok this informally suggests important heterogeneity to me, suggesting the need for nontrivial sample sizes.
I want to flag that I found parts of the results section hard to parse and sometimes details seemed to contradict each other. But key insights include:
have you tried to contact the authors? I think that would be high value -- both directly and in terms of field building-- and happy to help facilitate it if I can be of help. They might be particular eager to clarify and happy to hear how their research is valued ... and may also see this as a route to potential future funding.
setting “a significant proportion” of shrimp did not “show signs of recovery”
Is that really how they presented it? Let's double check. That is extremely vague.
for a fair comparison
a fair comparison to what?
The shrimp recover their ability to move after 5-10 minutes.
Again, I'm really not sure whether these are good or bad things. Why do we care if they recover their ability to move later if we're normally killing them with this?
OK this is explained a bit further below .... because they may wake up again before being killed
Based on this data, it is unclear if electric shock followed by ice slurry provides any benefit over ice slurry alone, provided the animals are kept in ice slurry until they are fully dead. (It is unclear how long that would take, though.)And yet, ice slurry is often regarded as “the bad way” to kill shrimp. In fact, Mercy for Animals has been actively campaigning against ice slurry slaughter[3].
Okay, following the above, I guess the point is that you kind of see similar things happen from ice lorry and electric shock, so it's not clear why one followed by the other is the best. ??
When shrimp first hit the ice slurry, they perform sudden full-body contractions (tail flips), but this also happens if you first cut their head off (check the supplementary material for a video).
Confused about what this is supposed to mean. Are we considering cutting their head off, or are you saying that if this happens, even though you cut their head off, that means that it probably doesn't indicate anything meaningful, just a sort of knee-jerk response?
Immersion in ice slurry caused a rapid and massive drop in heart rate “amplitude” within seconds.Returning shrimp to warm water after 5 minutes allows the regular heart activity to return.
are these things good or bad? Not immediately obvious to me.
We have very limited data on electrical shrimp stunning that doesn't support a confident conclusion as to whether it's good or bad.
I found your presentation below on this rather convincing. This also comports with what I've heard from other EAs (although perhaps the same circle of conversation). We need better evidence on which of these AW improving technologies actually reduce animal suffering. I'm in some discussions about possibly building and funding an evaluation service for specific tools and approaches (maybe something between The Unjournal and a fast-review journal, also inspired by Rapid Reviews Infectious Diseases).
Very small sample sizes do not always mean lack of inference. For instance, in very predictable contexts without a lot of noise, like, let's say, Newtonian physics, even a few data points could help us narrow our beliefs substantially. I like Richard McElreath's example about how you can substantially update on the share of a planet that is water by simply, even in the first few random samples, choosing a single point on a planet's sphere.
More intuitive -- if I ask 4 people to taste a drink and they all wince deeply in pain and disgust, I'm going to be highly confident it tastes bad. If all 4 smile and praise it, I'll be fairly confident that it's at least tolerable.
But I don't know that that is the case here. There might in fact be a lot of uncertainty and heterogeneity. What I wonder is whether the sample sizes observing the behavior and bioindicators of these fish are very expensive, or whether it could easily be scaled up with just a small amount of money. as a non-biologist, it seems intuitive that it should be cheap, but I might be missing something
e (N = 6 for each intervention) w
I.e., 6 animals killed with each. See previous comments about sample size. ... what's the within treatment variation etc., and how costly is it per animal? How does this sample size compare with typical measurements in these domains? If the measurements taken are costly, could we get more reliability with cheaper measurements and a larger sample?
This should be the #1 priority for new animal welfare funding, ahead of scaling existing work.
I think I would indeed lean in this direction, and I'd suggest that in fact most grants I've seen go the other way. (But I may have some biases here, this goes in line with what we're trying to do at the Unjournal, etc. )
[Consider -- does this post make a clear case for the 'should', demonstrating that the VOI of research here will exceed the expected (?) impact of the best or current 'existing work'?]
Instead, I hope this post inspires lots of people to tackle this major neglected problem.
v speculative -- this may be highly timely if we think the Anthropic IPO will be driving some money to AW , and donors are evidence driven and 'difference-making risk-averse'
I found that even the most well known (and well funded) interventions had limited evidence, sometimes pointing in the “wrong” direction.
In terms of the 'expected impact'/'uncertainty of impact' tradeoff heard of Animal Welfare as being in between GH&D and X-risk/GCR/AI Risk.
Your concerns may be driven by "Difference-making risk aversion" -- see discussion here, for example https://forum.effectivealtruism.org/s/WdL3LE5LHvTwWmyqj/p/9EENSGhiQiKFaRh4t
Or you might be driven by something more deontological, related to 'do no harm', perhaps?
We have mixed evidence on whether transitioning egg producers to cage-free improves welfare overall.
Maybe to be fair to note that this is along a specific dimension of transition -- conventional caged to regulatory mandated cage-free. Perhaps better evidence for other certified standards of free-range etc?
building R&D infrastructure that can rapidly generate high-quality action-relevant research results.
This is something Unjournal is trying to make happen. Come working with https://www.aw-econ.org/about and others. I think, given the very limited investment in this space, I think there are high marginal returns, although some of the value of information will be learning about what is or is not "epistemically possible".
We have evidence that the substitution effect of alternative proteins is weak, at best.
I think what we have is a substantial lack of evidence in this domain. This is something the Unjournal is trying to remedy. (Unjournal.org, see https://uj-pba-workshop.netlify.app/ for a link to some of our efforts, and https://forum.effectivealtruism.org/posts/3Eh8MbqLwFBsD7GK2/how-much-do-plant-based-products-substitute-for-animal or an earlier take).
Evidence in this domain is very hard to come by, and there are substantial doubts about the extent to which we can even reliably measure these things. (The aforementioned post gets at this. Also see our an evaluation of the Bray et al. paper, which casts substantial doubt on the ability to even measure simpler things like own price elasticity with conventional methods -- https://unjournal.pubpub.org/pub/evalsumbraybray/ -- I'm working to follow up this evaluation package with more detailed evaluation managers' discussion and dissemination, focusing on the applied issues.
Following up on this further -- see https://app.notion.com/p/Validation-Evidence-for-Food-Demand-Elasticities-PBM-Pivotal-Question-376e97e2ad3381a898d3ceb589b265f2 and links within for a preview.
At the same time, there's also just not a lot of economics, social science, and marketing work done that focuses on animal welfare implications (or alternative proteins).
The aforementioned workshop on associated pivotal questions will be focused not only on what we know in this domain (~substitution effects within and between animal and plant-product consumption) , but also on what we can know with given data, what we might be able to reliably learn with more ambitious data collection exercises, experiments, etc., and what (and what is sort of fundamentally unknowable and does not merit further research investment).
Even some of the most prominent animal welfare interventions have surprisingly weak evidence behind them
I've been hearing this for a while. It's something that organizations like Animal Charity Evaluators are trying to address, but they don't have the dedicated resources and funding that, say, GiveWell has. Furthermore, there's no comparable academic/national resource base for animal-welfare-relevant research. For example, in the economics animal product space, it seems most of the "agricultural economics" is oriented towards supporting the farm industry. (Perhaps with climate change a secondary priority in some countries )
Is this easier to read and use than the current separate PubPub pages?
Note -- we'd probably make this additional to PubPub, as the latter goes into standard bibliometrics and information standards.
(Which in turn, brings the risk of divided attention)
A toy decomposition to make the structure tangible: set the PBM price cut and the diversion shares, and see the implied displacement. These are placeholder ranges for elicitation, not estimates. The point is the wiring, not the numbers.
Add a folding box presenting and explaining the equations behind this!
08 / Interactive sketch Parameter dashboard
Give these clear anchors to hyperlink to, and allow people to extract hyperlinks
A toy decomposition to make the structure tangible: set the PBM price cut and the diversion shares, and see the implied displacement. These are placeholder ranges for elicitation, not estimates. The point is the wiring, not the numbers.
Let's try to use some referenced values as a starting point, linking them/tooltips. I don't think there's a lot of good research, but still good to start right
How this was made. Drafted by GPT Pro from existing Unjournal research and discussion (the elasticity-validation survey, the Bray et al. evaluation materials, and the PBM substitution literature), then built and polished into this interactive report in Claude Code. It is currently being reviewed and adjusted by hand. Treat figures and attributions as provisional until that review is complete; the governing evaluation lives on PubPub.
Just confirming this is indeed the status
Pills show the estimated direction by category.
document this better -- is red negative, green positive, and grey close to 0?
Anchor paper Bray, Sanders & Stamatopoulos
maybe we don't want to anchor too much on this --- NB that paper does not involve substitution. "Anchor paper" could be misinterpreted. This is The Unjournal evaluation package that is most strongly connected atm
Meta-analyses show variation rather than convergence.
evan in headers, let's have specific links (perhaps linking section below) for claims
The evidence base is large and shows recognizable structure across foods and countries. But sharp validation is thinner than the volume of estimates suggests. Bray et al. find standard observational scanner estimates fail badly against a randomized benchmark in their setting.
this tracks
Anonymous
doublecheck -- if it's 50 everywhere just take it out of the data set -- it's just distraction and probably our testing the form
Individual estimates with uncertainty (80% CI) · log scale
give response dates
public negative
'the possibility of'
Legal scholar lead Candidate curation Law and AI partner Animal welfare law Pilot papers Paid labeling Evaluator pool Workshop route
what are these buttons meant to do? Are they supposed to be links?
How should the model differ between US law reviews and European peer-reviewed legal scholarship?
Or should we focus mainly on the US context because of the greater 'review gap' as well as the greater role for court jurisprudence in the US. On the other hand US legal scholars may be paid more, overcommitted to lucrative and influential work, and thus less willing to do the evaluations.
useful to legal audiences.
--> are highly credible to legal audiences, useful to practitioners, and show potential for global impact.
the project should restart only if it has legal-scholarship ownership and a narrow pilot.
reword this. A 'narrow pilot' is what we will do, that's not a precondition
Not a generic policy-commentary outlet; the focus should be assessable legal scholarship.
this is a bit vague, needs clarification
Not a replacement for law reviews or journals.
why not?
Identify public legal research with unusually high expected value for evaluation.
This itself would be a useful public good, if we curated it well, with feedback from organizations that wanted to use this.
Naturally we will do this in a human-AI collaboration, with AI doing much of the initial search and filtering. (see https://uj-prioritization-prototype.netlify.app/ for a prototype for our main stream)
Choose a narrow pilot
We did this about 8 months ago, but most of these will probably be stale
how quickly alternatives to animal-source foods must diffuse for the food system to make a meaningful contribution to climate targets. That is directly relevant to public R&D, procurement, regulation, investment, and philanthropic choices being made by organizations working on climate mitigation, food systems, and animal welfare.
relevant yes. But how do we know it's important for these questions?
Default σ (no CI)
Fix the tooltip -- it shouldn't be in all caps
Subquestions Timeline
these are subquestions, but we're missing some of the 'goal oriented' questions here
CM Workshop ·
add hyperlinks/headers back to the workshop here
PQ3C — P(HLI abandons WELLBY within 3 years)
give the full question language, and hyperlink the question in context (tooltip if long)
Respondent H
just say 'anonymous 2' .... "H" is confusing
PQ1A: What is your probability that linear WELLBY comparisons are reliable enough for comparing interventions in LMICs? Respondents gave a central estimate (0–100%) and a 90% credible interval.
Note -- I did not intent to have CIs over probabilities. This was an artifact of a changed question and vibe coding. Also investigate whether this was the wording of the question when participants answered it
Germany consumer survey · late 2024 Free GFI Europe consumer survey (late 2024, published 2025): 25% of German adults and 23% of UK adults reported consuming plant-based meat in the last month. 47% of German adults and 41% of UK adults reported already reducing their meat intake or following a meatless diet. 60% in Germany and 56% in the UK reported at least monthly consumption of some plant-based product category (broader than meat). Since only ~5% of German consumers exclusively consume alternative proteins (see src-35), the large majority of the 25% monthly PBM consumers are omnivores. Survey-reported personal consumption is more direct evidence of self-eating than purchase-panel data, which tracks household-level transactions without identifying wh
this seems to need more digging into!
Together: PBM is roughly 0.1–0.15% of conventional by volume, or 0.16–0.4% by illustrative retail valu
this seems worth highlighting, even if it's a rough calculation
~$7–9/lb vs ~$5–7/lb
state this as percentage
Workshop
Pre-worskhop discussion !!
In brief
there is too much overlap between this and the paragraph beginning "The sceptical concerns"!
Readings & Resources
We need more of a TOC and sections/navigation on this page. Longer content in folding boxes
Background note: a first-pass Claude summary of evidence on PBA penetration and taste-comparability is available for sharing. It is exploratory rather than a vetted literature review.
shorten this a bit
takes a broader view than PBA alone
Rewrite to "extends Beyond pba"
If a 10% price reduction in Impossible Burgers leads many people to eat fewer chickens, that suggests a case
This is a "for example"
PBA buyers were already eating less meat
"allready eating less meat" is. Vague. And. I'm not clear what the argument is here
plant-based burgers are mostly substituting away from beef (not chicken),
The lower animal welfare burden of beef vs chicken may not be known to all readers
welfare
replace: "corporate animal welfare campaigns"
Connect to decisions: Given current evidence, is PBA funding plausibly competitive with corporate campaigns?
Also mention other questions, such as "will meat taxes improve or worsen animal welfare?" and "Will innovative products such as PBA and cultured meat substitute for farmed animal consumption, or will they mainly be taken up by (existing) vegans and vegetarians"
Quantify uncertainty: What's a reasonable range for the cross-price elasticity between PBAs and chicken, given what we know and don't know?
This is kind of captured above, but I would do something more here with belief elicitation, interactive updating, and aggregating knowledge.
Broaden the scope:
This isn't something we're trying to achieve. This is just a path we're thinking of taking with the workshop.
and
and/or
s — p
Colon instead of dash.
nd can we conclude anything at all with current methods?
Rather than "conclude" something like "do currently available methods and data even yield useful insight?"
can we actually conclude about substitution effect
Conclude is too strong here. I would say, what can we reasonably say about substitution effects and with what confidence?
agricultural economists
Also industrial organization, quantitative demand estimation, economists, and quantitative marketing economists
brings together
Aims to - we don't have a confirmed guest list yet.
funders
funders and industry and charity practitioners
animal welfare researchers
Economists interested in animal welfare
, but stated choices may not reflect real purchasing behavior.
Tool tip with some reference discussion of the limitations here.
identification strategies vary considerably in rigor.
Mention the use of instrumental variables and other strategies here, perhaps in a tooltip. Give specific references in that tooltip.
raising questions about which to trust.
Add a tooltip here, discussing some of the strengths and limitations of each, using the context and explanations discussed elsewhere . Let me know if you need more context on this.
Different specifications can yield very different elasticity estimates.
... (tooltip) Note this is in part due to the aforemationed point that elasticity is not likely to be constant across an individual or market demand curve, and there will also be heterogeneity thus, it matters what parts of the curve you are looking at, and which markets, times, etc.
IV and experimental estimates often diverge in opposite directions from naive OLS.
rephrase this -- it's not quite right, and confusing
Also be clear: these are estimates of own price elasticity, although it seems unlikely that cross-price elasticities would be more consistent or robust. And these are price-shifting field experiments. But also note, in a tooltip, some of the critiques of these experiments themselves. Ask me if you need context.
especially in the earlier years when these products were emerging.
I don't see what this part of the sentence adds. If the data is available in later years, we can focus on that later data. Maybe just leave this out, or mention something like "partly because of the limited availability of these products, and lags in releasing data for research use." -- But That's tooltip details. Also, I want you to ground some of these statements with references and links, mainly in tooltips.
they anticipate lower demand,
More when they expect demand to be more price sensitive --- have pro or counter-cyclical pricing; Put the details in a tooltip
Why this is hard to measure
These explanations are taking up too much space and will take up even more when you consider a wider range of approaches.
Use folding boxes and tooltips more.
everal key challenges complicate this:
These are key issues with ~traditional econometric (IO and quant. marketing) methods.
Field experiments (supermarket-level or at school cafeterias etc.) have less of an endogeneity issue, but some of these issues are still present (e.g., short term vs long term), and these are hard to implement at scale and cleanly, and have issues of their own (see the notes/discussion, and sketch these).
Hypothetical and small-value choice experiments and hypothetical discrete choice surveys have other important limitations (mention these, from the sources and discussion).
that's a
That suggests
the strongest causal evidence.
moderate this. This is vague. and there are a few kinds of field experiments in addition to this, including price shift experiments (esp. Bray et al), although few if any involving PBA
though
not "though"
These measurement challenges mean we should interpret existing estimates cautiously, while still extracting what information we can. The workshop will discuss which methods are most trustworthy and what further research could help.
this is a bit generic, maybe not necessary
likely
likely --> "may" ... tooltip some other possible explanations for this. (discussed in the notes).
One concrete finding worth engaging: The evidence suggests that the vast majority of PBA purchasers are omnivores, not vegetarians or vegans — one study finds that only around 1% of high-spending plant-based meat alternative households are actually vegetarian. This challenges the intuition that "PBA just captures existing vegans" and raises the stakes for substitution estimation: the counterfactual meat consumption displaced may be much larger than assumed.
This is probly too strong ... needs caveating and referencing and tooltips.
have serious limitations that are worth confronting directly.
just "seem to have serious limitations, and existing estimates are 'all over the map'" -- hyperlink or tooltip and link https://forum.effectivealtruism.org/posts/3Eh8MbqLwFBsD7GK2/how-much-do-plant-based-products-substitute-for-animal#Existing_Research
Scanner dat
--> Observational demand estimation (using scanner, retail, and macro data)
And we're asking a prior question
rephrase. More like an "revisiting an underlying question"
(chicken vs. beef vs. pork),
make this 'between different animal products' and 'e.g., chicken vs. beef vs. eggs...' -- relevant for AW when considering issues like the AW impact of meat taxes -- which might shift consumption from beef to chicken, with a higher AW burden -- mention this briefly with further details in a tooltip
Germany PB meat substitute production volume, 2025 (Destatis)¹⁸
Skip this one
US plant-based beef price premium vs conventional beef, category average
Research and state this. Also for impossible and. Beyond vs conventionally ground beef
burgers
The correct term is hamburgers
Penetration
Penetration might not be the right term. I think we just mean market share here
Breaded centre-plate
Whatts. Breaded. Center plate?
Butcher) and the Nordic countries, where per-capita consumption of plant-based foods is high — probably sit above Germany, whic
Evidence for this claim? Otw State as ,,we. SpeculTE that,,
Claude) prompts;
Claude and various openai models
David Manheim (Technion/ALTER) and Mirjam Capuder (University of Maribor) participated. The session was recorded — all attendees joined knowing this. It covered introductions, a walkthrough of the interactive cost model dashboard, and early framing questions about key modeling uncertainties. Full recording pending participant review before public release.
mention the insights here? I'm not sure we'll put out htis video either; it's not something interesting to watch , I guess. It was mostly preparation and broad discussion.
ene-edited cell lines are the most under-modeled factor in published TEAs.
this is a strong claim ("most under-modeled") -- what's it based on? Reasoning transparency please. Provide support and links to this, tooltips etc. I want to make sure this is well-backed before I post and ~"co-sign" it !
technology and reaching different conclusions based on different priors about scale-up timelines and capital availability.
all claims need more direct supporting evidence ... quotes, links, etc.; tooltips are your friend
1. The $1–$100/kg spread is real disagreement, not just uncertainty. Named domain experts — Swartz ($25/kg) and Lattanzi ($100/kg) — are 4× apart with tight confidence intervals. This isn't a calibration problem; they're looking at the same technology and reaching different conclusions based on different priors about scale-up timelines and capital availability.
wait -- are you sharing the beliefs here? we didn't wnt to do that yet!
European Morning Drop-in Fri May 8, 2026 · 9:00–10:00am ET (3–4pm UK · 4–5pm CET) · Zoom Informal drop-in for EU/UK participants who could not stay for the full afternoon session. Primarily attended by European/UK participants (CET timezone). The session was a recorded Zoom — all attendees joined knowing this. It covered introductions and a preview of the hydrolysates and gene editing framing that would open S1. Full recording pending participant review before public release.
skip/remove this -- no one showed up