Interactive model: public test after selection into evaluation
Where is the 'pre-selection into evaluation' probability in this diagra (Po)
Interactive model: public test after selection into evaluation
Where is the 'pre-selection into evaluation' probability in this diagra (Po)
record
what does 'adverse record' mean below? I guess we are talking about the evaluation rating here?
Evaluator anonymity in current Unjournal public data. A bit over half of deduplicated evaluator-paper rating pairs use anonymous or generic public identifiers: 65/113, or 57.5%. This matters because later gatekeepers see the public report, but often not a named evaluator.
interesting but you haven't really connected this to the discussion or model. Why does evaluator anonymity matter to the author's decision?
That the premium persists once public evaluation is common.
also, as with these signaling games, there's usually multl equilibria, including a sort of 'babbling' one iirc
Author Benefits, With Limits
this section is not written well. As far as I know only the things linked to Propositions 1 and 2 even engage the model ... you can mention other possible costs and benefits not considered by the model, which are largely empirical questions
Non-exclusivity: the public record can stack on a journal path.
I don't see how this is a 'result' -- it's just an obvious construction
A calibrated causal effect of public evaluation on citations or journal acceptance.
simplfy this ... that's obviously an empirical issue.
The Proposition 2 signal, made interpretable by benchmarked ratings, journal-tier equivalents, and uncertainty intervals.
link and tooltip "Proposition 2" -- it hasn't been introduced yet
backs
'backs' or 'focuses on'
third category
what 'third category'?
not formal publication by The Unjournal.
remove 'not formal publication' -- already noted, but the 'formal publication' is not a benefit per se
downside
'downside risk'
visible
The diagram below is underexplained. Also note we geve evaluators the opportunity to revise if the author points out clear omissions or errors
supports conditional timing
this needs clarification or a tooltip. I think you are talking about the 'conditional embargos'?
Pr(favorable | H) = Pr(adverse | L)
The probability of why should the probability of each of these be the same? That's weird and seems very limiting to me. The model should be made more general.
favorable
Use variable names for these also.
The public signal Y
This was the first time that you mentioned the variable y - you need to define it better.
A Simple Base Model
This could be fleshed out more carefully and slowly with more explanation.
author response
encouraging authors to respond, and evaluators to update their evaluations if authors find clear mistakes or oversight.
Tooltip: We're also working to build and coach our evaluator pool, and hope to provide paid calibration workshops in future.
The Unjournal's design
"The Unjournal works to limit the noise component through..."
as accuracy q
is accuracy a 'false negative and positive rate' or what -- give a tooltip on what q means
Rejection bias
Flesh out -- why would this be a particular concern for public evaluation? (Explain briefly, tooltip a longer explanatino)
Concern
wait -- 'concern with what', 'relucatance to do what'? Needs clarification, seems somewhat inconsistnet
Yes. Peer review is noisy: the 2014 NeurIPS experiment had committee disagreement on 43 of 166 duplicated papers
link please, and give details in tooltips
where The Unjournal's structure changes the mechanism from cases where private author risk remains.
this is a bit too stark. I don't know if we can be sure that UJ fully changes these situations, nor whether in the 'risky' cases the risk is substantial. State it more tentatively or diretionally
elevant bar
'publication tier bar'
visa
sklip 'visa'
Treat this as a set of considerations, not a decision rule.
We discuss a series of reasonable considerations
plus author response
and, evaluator updates, and evaluation manager synthesis.
evaluation
quantified evaluation and a detailed report
remaining
'mistakenly being connected'
trongest private case is a paper already ready for serious expert scrutiny whose true quality exceeds its default credibility.
Should we also mention the 'submitting your work for public evaluation can also be a strong signal of your confidence in the credibility of your work?'
Logic map: where public evaluation changes the author's payoff
LaTeX is not really rendered in the diagram below for "Bar C", and also it's not clear what these variables are that need better labeling.
Read it as a working decision aid, not a final institutional position.
"I don't like 'read it as a working decision aid' so much" -- it's a bit of this, but also a bit of a 'here's how to think about this' discussion document. You can say 'this does not represent The Unjournal's official position or advice' and link our Author FAQ https://globalimpact.gitbook.io/the-unjournal-project-and-communication-space/faq-interaction/for-researchers-authors for the latter. But also avoid the 'not this but that' typical AI language. These are 2 separate things
seminar-defensible
why 'seminar-defensible' -- this could confuse things, people will say "why not present it at seminars"? Also, you mean for such a paper; this could be interpreted as public (UJ) evaluations will make it seminar defensible
A public commitment — and a signal. “I’m willing to have this evaluated openly.” Feedback now, a public signal now — journal path still open.
Can we have a 'separating equilibrium' or other image here. Let's focus this slide on the "willing to recieve and respond to public criticism signals research strength' part ... and then the 'immediate underdog benefit thing' is the next slide (which you can tease)
The Pivotal Questions project
use the space on this slide bette
runs
1st and second box ... thought bubbles or callouts (0-100 percentile relative to pool); Both suggestor and assessor writes a motivating explanation/discussion.
"Whole team votes" -- 5 point approval scale (strong/weak/neutral)
What does open (Unjournal) evaluation provide? Now: faster, useful feedback + a credible public signal, and useful inputs to practitioners and funders. Soon: it starts to carry career value. Eventually: it can replace much or all of what we ask the journal stamp to do. Which of these would actually help your work?
the text on this 'all green slide' is a bit hard to read. Make it more readable and clear.
outputs
prioritization and evaluation outputs
quality
and usefullness, providing multodimensional ratings and discussion -- not just "which venue published it".
For research leaders & managers encouraging engagement signals a commitment to rigour, transparency & innovation — and opens the research-impact channel (our funder & practitioner network, incl. Pivotal Questions). Two audiences. Individuals/committees: a strong public evaluation should count as evidence of quality in its own right — hiring, promotion, REF narratives, grants, esteem. Research managers / those setting direction for a group or department: encouraging engagement is a visible demonstration of research rigour, transparency and openness to innovation — and brings more, faster, more transparent feedback and signals than standard peer review. Exeter needn’t lead, but it could position itself as open to this innovation. The research-impact channel is real: strong connections to funders and nonprofit practitioners, including via Pivotal Questions. If useful, happy to discuss a light next step. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
space this slide out more
Each
add a vertical space before this sentence
pool
in the prioritized pool
research
I think I wanted more images, more faces, here or below. I had a comment about that
with apologies to a certain lager
remove the disclaimer
certain
center the image.
3.9
adjust these examples to have some with wider and others w narrower bands
and importance if true
leave this off; that's not quite what we ask, we ask about implications
expressed
--> with quantified uncertainty
Full “model” (v. preliminary, ~Fable-generated with human feedback): unjournal-reluctance-note.netlify.app Reframed per your screening/sorting logic: the value is highest when you’re strong-but-under-credited or just below a bar — exactly where an extra credible signal can move you. And if committing to open evaluation becomes a positive signal in a sorting equilibrium, you want to be an early mover. The case for waiting is narrow: work that already clears the bar AND a genuinely sensitive moment — then the extra signal adds little. I’d be less worried about about “harmful criticism”: our evaluations are constructive and you get a public response; public scrutiny isn’t a bogeyman. For timing/embargo, people talk to us. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
This bit is cut off at the bottom of the page. Need to use the vertical space a little more conservatively here. Have smaller fonts or more use of the horizontal space.
Watch the 2-minute explainer
still not embedding
We’re working to be highly visible — so evaluations & ratings are seen before conventional journals/reviewers weigh in
Add image of our Google Scholar visibility here : https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=source%3Aunjournal&btnG= as one example
journals never teach
Obviously, journals never teach this. Also, our research education system doesn't tend to teach this. But it's also not clear that this alone is something that has a lot of career value. It's the methodology, theory, context, etc. that has the career value, as it allows you to do better work.
Build a reviewing reputation early — citable evaluations on a CV Only where someone here sees the fit — not a demanded programme. A reading group could shadow-evaluate a published package and compare their judgements to the expert evaluations — good methods training. We can support student involvement and there are paid RA-style roles. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
I don't know that I would emphasize this quite so much. First of all, this only (mainly) holds if you choose to sign your reviews, which not everyone wants to do, particularly early in their career . But also, it does show some value and helps you demonstrate your understanding, but I don't think that the profession rewards refereeing and reviewing quite so much.
A nice thing is you will get feedback on your evaluation from us and potentially from the authors, which will help you improve and learn.
Training in structured evaluation — a skill journals never teach
I think this is missing some of the key benefits here. Our public evaluations help you understand what issues other economists care about, as well as, to some extent practitioners, funders and people interested in impact. It's a methodological discussion that will help your own work, as well as help you understand the ways to engage in the peer review process .
This also helps make you part of a conversation involving funders, grantmakers, and people that might be able to help your career and help you have more impact.
evaluation, data
This is a bit vague. I'm not sure about this wording here.
timing
I can see something like "sensitive career moments + Work that likely passes the bar" Being a situation where you wouldn't want to have this sort of public criticism or these additional signals. But it depends. If you're at a sensitive career moment but you think you're coming up just below the bar, or you are being systematically undervalued, then it might be helpful to have these additional signals. And if making a commitment to public evaluation itself becomes a positive signal in a screening, sorting equilibrium, you will definitely want to do it in such a situation.
Ask about timing:
Ask who? What am I supposed to say about this?
criticism likely about taste / importance / fit
I don't see this as clearly argued - The Unjournal isn't going to give these sorts of critiques in a way that I think will be harmful, and I don't want people to keep thinking that public criticism is somehow a deeply harmful thing.
concentrated early-career or coauthor downside
Not sure about mentioning "co-author downside" - rephrase this somehow.
test
I'm not sure "test" is the right word. It's more like a commitment and a signal.
Exeter strength Capabilities it brings Behavioural & experimental (Hauser, Fonseca, Balafoutas) decision-making, elicitation, policy design LEEP / environmental (Bateman, Groom, Day) valuation, natural capital, evidence-based policy Health & wellbeing (Jamison, Medina-Lara) cost-effectiveness, wellbeing, decision modelling Development / applied micro (Jamison, Banerjee) interventions, external validity, welfare Econometrics & methods (Clarke) evaluation design, calibration, meta-science
This seems a bit small, not using the whole page. ?
field specialist
And Pivotal Questions Advisor
Forecasts via Metaculus; partners incl. Institute for Replication, Center for Open Science
The Metaculus thing is a bit separate now. Go back to the Pivotal questions knowledge base and update on this.
al Questions
Hyperlink. https://info.unjournal.org/pivotal-questions.html here
Examples: cultured-meat costs · plant-based substitution · WELLBY ↔︎ DALY conversion. This isn’t replacing academic agendas with consulting — it’s taking questions funders and practitioners already face and asking what research and expert judgement imply for actual choices. We elicit high-value questions, curate and evaluate the evidence, add structured expert forecasts (our Metaculus community), and synthesise. Timing: the wellbeing and cultured-meat workshops have happened; the plant-based substitution workshop is still in planning. Partner/related orgs include the Institute for Replication, the Center for Open Science, and Metaculus. (Logos can be added if we want.) span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
Hyperlink the workshops here.
partners incl. Institute for Replication, Center for Open Science
Those aren't the partners on this. The partners on this include Founders Pledge and Animal Charity Evaluators. People from Coefficient Giving and many other organizations have participated in our workshops on these pivotal questions as well.
Each package = 2–3 expert evaluators, backed by a broader community:
I think the earlier version of this page was better. It's a little bit confusing because you're talking about evaluators, but you're showing the management team and advisory board here.
Plus:
I don't like the "plus" here. The substantive reports are equally important, if not more important, than the ratings.
Quantified, benchmarked:
This benchmarking is necessary because we don't quote, accept, or reject, and we don't have a journal tier. So this is the path to Unjournal evaluations being something that has career value as well as value for research users.
~330 screened → the ~57 we’ve published
This doesn't quite make sense. First of all, we don't publish the research; we publish evaluations. Second of all, where are you getting the 330 figure? Maybe leave this off.
is it already famous?
That's not quite getting at the right thing because we actually do favor research that is more well known, as it's likely to be more influential. What we don't favor is research that is just simply seen as deeply intellectually interesting or clever
The full workflow
It shows up as too small on the slide. Can we rearrange it somehow? Maybe make it horizontal?
lands
"lands"
is
can be
“A very positive review of our work”
'positive' is more about their work and not about the evaluation ... look for better feedback, including from authors
180+ evaluators stand behind every evaluation:
the "180 evaluators" don't stand behind every evaluation ... this doesn't make sense. We have usually 2 evaluators per evaluation. Also mention the field specialist counts too
What we’ve evaluated — 57 packages by area
incorporate image/text of one of the award-winning evaluations here (or as a bonus vertical slide)
“But why expose my paper?”
ilustration is OK but it's a bit too 'obvious' -- it should note that a signal could go in either direction ... perhaps should be nested in a graph considering both internal and external signals ... and illstrate cases where the expected value is positive
The strongest private case is a paper already ready for serious expert scrutiny whose true quality exceeds its default credibility
what about the case that 'willingness to make all work available for evaluation could be a strong signal of your confidence and credibility'?
good
More like, "How does AI evaluation compare to humans?"
And I'd frame this more as an open question, one we're exploring, but at the moment the general attitude seems to be that there needs to be a human in the loop, at the very least, making the final judgment calls, prioritization, and communication
Questions for you
I'd add things like: - How could this invigorate teaching and research training? - How could it help with building agendas, attracting funding, and demonstrating value for exercises like the REF?
What would make an evaluation count as evidence of quality?
This is perhaps the most important question here - maybe put this one first. ... What would make it reliable, meaningful, and valued?
Where would faster public evaluation be most useful in economic
Not quote in economics. That's asking too much. Leave the last bit out, but presumably they'll understand that we're asking about what would be useful to them.
From papers to decision-relevant uncertainty
This slide should have the header "Pivotal questions project" Not
LLM vs. human ratings: modest correlation — not aligned enough to substitute
We don't have such strong evidence on this to say it's not a line enough to substitute, to be honest. We only have one trial that we attempted. This slide overstates things, and it would be better to have it link and show some of our output, just so people know what was done in our trial.
Human judgment still central
This seems like a claim, but what backs it up?
evaluation
"efficient and transparent evaluation" ... "nd connection to 'real stakeholders and impact'"
papers
Generated by AI tools, which may or may not actually be correct or useful
Forecasts via Metaculus; partners incl. Institute for Replication, Center for Open Science
Those are not the relevant partners. The relevant partners are: - Founders Pledge - Animal Charity Evaluators We've had participants in workshops from Coefficient Giving and many other organizations..
Metaculus is not really at the center of this at the moment. ... It's more on our own pages and platforms. https://uj-wellbeing-workshop.netlify.app/beliefs the Metaculus thing is sort of an extension.
Questions
shoulf hyperlink https://info.unjournal.org/pivotal-questions.html
~9 management · 15+ advisory board · 40+ field specialists · 180+ evaluators (over half economists, over half doctorates). The point isn’t celebrity endorsement — it’s that we have enough disciplinary coverage, advisory oversight, evaluator depth, and process experience to be taken seriously. Usually 2–3 evaluators per package (not “everyone behind every evaluation”). Field specialists across eight areas help prioritise and recruit. Two of those field specialists are here at Exeter (next section). Full team at unjournal.org/team. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
Make this bit larger and more visible. Maybe include some of the visuals on the composition of the evaluator pool.
markets
I don't think labor markets get at it here. We're talking about the impact of transformative technological change on labor markets, not labor markets on their own.
certified
Accepted after substantial revisions ... I don't know why we would use the word "certified" here.
R&R
At best
The
Versus a traditional journal
problem
Add the image of the fish banding together on this slide. You can find that on the previous PowerPoint/Google slide presentations.
It’s a coordination problem
Add "funding and grantmaker incentives will help".
And maybe replace this with "solving the coordination problem".
Replace Fear of Standing Out with Fear of Missing Out
Also a bullet about how we're making ourselves prominent in the ecosystem so that the evaluations and ratings will be seen before the paper is reviewed by conventional journals.
e.g. Bonn’s tenure criterion: “at least one article in a top-5 general-interest journal.” The signal is the system. The honest diagnosis is a collective-action failure, not preference. Outside demand (funders who value research) can fund a better signal while academia decides how much to trust it — which is why an early, low-cost engagement from a place like Exeter matters. The Bonn example shows how hard-coded journal prestige has become. Detail slide. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
Why is this quote here? It doesn't seem to relate to the slide.
The full workflow
Hard to see the diagram below. Can we make it more visible somehow? Larger or horizontal, or allow us to zoom in?
Coefficient
Feel free to hyperlink these.
Research evaluation for choices that can’t wait. The honest origin story (keep on-slide light, say this aloud): some early funders and partners come from the global-priorities / EA-adjacent world. They’re not mainly asking “is this top-5 material?” — they’re asking “how should this change our beliefs, and what should we do differently?” They need quantified beliefs with uncertainty and explicit reasoning. That’s a different demand signal from the journal system. Important framing: this is NOT “academics don’t want to change.” Many academics dislike the current system — but individual researchers and departments can’t safely move first. It’s a coordination failure mistaken for a preference. Outside demand matters because it can pay for a better signal while academia decides how much to trust it. And the demand may grow: AI wealth may expand impact-focused philanthropy — Anthropic has confidentially filed a draft S-1 for a proposed IPO (not money in hand, but a plausible tailwind). span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
Skip this slogan. It's not just about speed and timing.
answers before journals deliver.
And they want more feedback then simply "which journal published it" ?
We are not a journal
Make it clear we're a non-profit and we focus on research with potential for global impact.
public
Publicly Hosted
decoupled from gatekeeping.
What do we mean by "decoupled from gatekeeping"? I'm not sure about that one.
—
Use a colon here, not an em dash
publish
You can publish in a journal too
“Published — so stop bothering me about it”
Add some more vertical slides with the other costs of the existing system/benefits of separating evaluation from "publication", And making the evaluation public
disseminate
Why is this in bold?
dissemination
It doesn't really govern dissemination, as noted in the other bullet point... It governs careers and research credibility.
1 · The problem
These green slides are not so visually compelling. The text is small, the numbering is not particularly helpful, and there's no image or anything that makes it seem interesting.
Careful quantified evaluation can begin to compete with — and eventually replace — the journal stamp.
That's nice, but I also want to emphasize the value that we're providing in the medium term.
Which pieces, if any, would actually help your work? Distinguish horizons: near term, this provides useful feedback, decision evidence, and an additional public quality signal. Medium term, if the evaluations prove calibrated and useful, they can begin to carry career value. Long term, they can replace some of what we currently ask journal prestige to do — not a claim that committees should ignore journals tomorrow. Final spoken close: “I’m not asking Exeter to adopt a system today — I’m asking which pieces of this, if any, would be useful enough for researchers here to try, use, challenge, or build on.” (Aside: the deck is open to Hypothes.is comments.) span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
What was meant by "pieces" here?
Recruit
Recruit an evaluation manager and ...
2–3
~2
work
Use the word 'research' here, not work.
relevance
As a team
response
And evaluators can adjust to this response.
DOI
And an evaluation manager summary/synthesis
How the triage runs
I don't think this is the right diagram. I think we want the other one illustrating just what the process has been ... People on the team suggest it, give it a rating for prioritization/potential for impact, the whole team votes on it, we finalize it, and liaise with the authors, etc. This is just a diagram about a particular way that we do or do not consider certain things in doing this prioritization.
There’s already a real connection to build on — if useful. Not cold outreach, and not an institutional ask. There are already people and examples connected to Exeter; if Julian or Ben are here, acknowledge them. I’m interested in whether any of these connections are useful to people in this room. span.MJX_Assistive_MathML { position:absolute!important; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display:block!important; }
Drop this. This is obviously already a point being made.
building
I don't know which building I'm going to be in. Maybe say on this campus or something like that.
here
"here" Is vague.
cultured-meat costs
Hyperlink the workshop here. Both of them
Recognise better signals
Research leadership: ...
About Summary Pivotal Questions Live Sessions Resources ▾ Readings Linear WELLBY Analysis DALY-WELLBY Conversion Metaculus Question
colore on black text here is very hard to read
Derrick Risner
Would not attend live but offered to follow up asynchronously.
Session discussion interest — from participant forms
Fix this. It takes too long to generate.
PQ3B — Share of major funders using WELLBY in 5 years
how many use it now -- put this in context
How many WELLBYs equal 1 DALY?
check and annotate -- what does e.g., 5 wellbys per DALY mean in context, and how does it compare with what people currently do?
problem under consideration. So I'd resist doing a simple exchange rate."
This seems like a valid objection, but I think we still phrase the question such that you would give a meaningful answer, or you could give a meaningful answer in this case in terms of the value generated if you were forced to use a single conversion.
PQ1B — Recommended measure for funders
The discussion might be more valuable, or I would say is likely to be more valuable than the response, particularly for this question.
Composite well-being measure
let's do better to differentiate this from calibrated well-being. It's not fully clear to someone glancing at this briefly.
PQ2 — DALY / WELLBY conversion factor
this is not a 'subquestion' ... label this page header differently
Wellbeing Workshop · Beliefs Analysis
Add backlink to the workshop and to the elicitation form here.
How many WELLBYs equal 1 DALY?
Make sure you can access the literal question from this interface to know exactly what the respondents are answering. If these things get very long, you can use tooltips.
Modeled disbursement by end-2026
some visulaizations (bar or donut or mosaic graph) would be helpful here
Interactive uncertainty model
I don't htink this is a stochastic model? Perhaps an extension of this should give these (correlated?) distributions. ... Squiggle-type modeling
There should also be discplay of the actual equation behind the model, and a folding box or linked page explaining it in more detail
BOTEC model
Maybe put this model at the top, after a short preamble?
Prediction market
any prediction markets deal with the fundraising/donation/nonprofit askpets of this?
. Independent and politically diverse funding remains valuable even in a high-liquidity world.
why? is this just a truism?
Organizations should distinguish runway decisions from upside options. If a project is valuable only under a fast-funding scenario, that dependence should be explicit rather than hidden inside local rumor.Funders and field builders should prioritize grantmaker capacity, plural donor relationships, legal vehicles, and evaluation infrastructure. These are the bottlenecks that convert paper wealth into usable grants.
this advice seems on the overly generic side?
Grantmaker capacity multiplier
what is this -- things like this need definitions, in the text with a link and in tooltips
rch memoWill AI
title font too big, taking upd too much screen saace
Will AI Wealth Actually Flood AIS/EA Philanthropy Soon?
I'll try to respond / adapt to hypothes.is comments, especially if you flag @daaronr
public data
Unjournal data
Logic map:
There's some text overlap in the diagram.
e possible costs of negative or mixed findings.
what 'costs of negative findings'? This is a bit of a vague sentence.
Request evaluation when:
also state and interpret the equation in terms of 'benefits vs cost' -- easier to interpret I think
or policy relevance.
remove 'or policy relevance' perhaps -- The Unjournal prioritizes research with global impact potential (although that's not what we mainly rate the research on)
or the likely criticism is about taste, importance, novelty, or fit rather than checkable claims
not sure I understand the logic behind the latter part
or the likely criticism is correct
how would someone know this?
Author-Facing Guide
This is a possible set of considerations -- I don't want to state it as definitive
reader s
maybe the 'gatekeeper' or 'editor' instead of the 'reader'?
public signal is Y in +,-,
Is it a truly 'public' signal, or only one the evluator sees after reading the paper? Or should we have both?
r[qA - (1-q)L] + (1-r)[(1-q)A - qL] - k ≥ 0
notation needs improvement, and it should be explained more -- how derived, how to interpret it? Tooltips and expanding sections could help
Requesting a noisy public test is not the same as disclosing an already-known verifiable fact.
I suspect another paper has dealt with this question ... 'when noisy signals help the seller' or some such
e reader should not update from p0 after observing the evaluation res
this needs clarification, I don't quite see why this is the case. Isn't it possible that the author's signal is positive so they submit, but the evaluator reading the paper gets a negative signal?
he relevant prior
relevant to whom?
he reader acts when posterior belief exceeds threshold c.
what is 'acts' and what is 'c' -- underexplained
binary quality Q in {H,L}, w
is this 'binary' rhe relevant threshold? Where did it com from? is it sort of generalizable? Consider if it misses some important nuance
tive, the reader
who is the 'reader'? I think we mean the later journal evaluator, or career gatekeeper
Main
base
Single
simple
empirically import
why?
Public anonymity statistic. The anonymity choice is empirically important. Running python unjournal_anonymity_stats.py on the public data bundle gives 65 anonymous/generic public evaluator identifiers among 113 deduplicated evaluator-paper pairs with quantitative ratings: 57.5%. In the subset matched to published-evaluation status, the share is 63/105: 60.0%, with 7 unmatched title rows. This supports saying "a bit over half" choose anonymous/generic public identifiers, but the denominator should be stated. The wider evaluator_paper_level.csv denominator is not clean for this claim because survey-only rows are assigned generic Evaluator N labels.
this should be a fold or footnote -- give a quick statistic and footnote yow it was captured
are the right default
too strong. And takes the 'high dowside' idea for granted too much IMO
When the answer is unclear, the practical move is not immediate publicity. It is a fit-and-timing conversation, coauthor consen
too much 'not this but that' AI speak. And 'publicity' is vague here
bout conditional embargo.
too much use of bold within paragraphs and prose
2. Is the main obstacle credibility, visibility, field fit, or network access?
this needs further explanation and clarification -- 'usual channels' should already encompass clarification
e relevant question is not whether public evaluation is always good; it is when a public signal improves expected outcomes relative to waiting, revising privately, or continuing th
this is the 'AI language of dichotomy' overused
a result that a credible public test strictly helps authors whose default standing sits below the bar — and we are precise about the downside it carries for those just above it;
The language of this is a bit unclear. Try to make it easier to understand.
(1) a
Don't use bold here.
Explicit crux Which specific uncertainties — AGI timing, takeoff speed, power-seeking tendency, offense-defense balance, pause feasibility — most shift expert p(doom) estimates?Community solicitation for explicit AI-risk cruxes: uncertainties whose resolution would significantly shift p(doom), including AGI arrival year, takeoff speed, power-seekin
this is meta -- I don't want meta, or at least put that into an 'opt-in' list
ee our early automated prioritization prototype, which is outside legal research and currently focuses mainly on economics and related work.
We can swap in here the legal prioritization prototype -- https://uj-prioritization-prototype.netlify.app/legal/ -- please do this -- and note that we're looking for feedback and examples to help improve and train this. Note that we don't envision this prioritization to be mainly driven by AI models -- humans will be making the ultimate decisions -- but these tools can be very helpful in the process.
Comment directly on this page using the Hypothes.is sidebar (the < tab on the right edge). Or use the rating buttons on each paper card — human ratings are how we will calibrate these scores.
Give people the option to suggest/add content.
Show legal scoring rubric & themes
Show/hide
Legal scholarship spinoff
Proposed and in consideration.
Comment directly on this page using the Hypothes.is sidebar (the < tab on the right edge). Or use the rating buttons on each paper card — human ratings are how we will calibrate these scores.
Let us know if you have any questions about this.
Elliot Swartz on the gap between academic analysis and industry learning.
this is not Swartz! adjust!
How this was made. Drafted by GPT Pro from existing Unjournal research and discussion (the elasticity-validation survey, the Bray et al. evaluation materials, and the PBM substitution literature), then built and polished into this interactive report in Claude Code. It is currently being reviewed and adjusted by hand. Treat figures and attributions as provisional until that review is complete; the governing evaluation lives on PubPub.
Make this a folding box - and the header should say AI/human collaboration in some way
Another folding box should have the standard call out about how we want feedback, and you can use the hypothesis tool for that.
Note: This workshop is in early planning. The framing, evidence base, and participant list are still being developed.
Still considering how to frame this workshop, and it depends on interest and participation. One frame is directly targeting what we know about plant-based products, who consumes them, and what it suggests for potential substitution and animal welfare. However, that evidence seems to be rather thin, inconclusive, and premature, perhaps. (See links to EA forum posts, etc.) Furthermore, our evaluation of Bray et al. on experimental versus standard quantitative marketing/I.O. estimates of own price elasticities suggests perhaps deep uncertainty. and lack of ability to be confident in these parameters, not to mention cross-price effects and substitution patterns. This potentially motivates a pivot towards focusing on these methodological questions, as well as framing it in terms of "what can we know and what research is worth pursuing."
Plant-Based Substitution Workshop · May 2026
Change the date here. The date is still undetermined, probably late summer 2026.
Thank you for participating in The Unjournal's Plant-Based Substitution Pivotal Questions workshop. Your feedback helps us measure the workshop's impact and improve future workshops.
Remove this page for now because it makes it seem like the workshop already happened.
A major methodological innovation. The framework is elegant and the estimation strategy is sound. The empirical component would especially benefit from more diverse and reliable samples, and from direct comparisons against existing scale-correction methods so readers can judge incremental value. Logic and communication could be tightened in places — rated lower here than the other dimensions.
This is not his full evaluation. He gave a very in-depth evaluation, and you've only taken one paragraph here.
The cost of calibration questions The central tension is practical, not theoretical. Prati flags that the evidence rests on a large number of calibration questions. It is unclear how well the correction performs with the realistic two or three CQs — and even two can be a heavy burden in large surveys. He suspects this is “one crucial reason anchoring vignettes have not been implemented at scale in 20 years.” Kaiser rates the work highly but pushes for more diverse, reliable samples and direct comparisons against existing scale-correction methods, so readers can judge the incremental value. His lower marks fall on logic & communication and on claims & evidence.
Scene 1 of 6
Why "scene"? Are we making a video? Does it even need numbering
Claims & evidence
these tooltips are not coming up after seconds of hover. -- I only see the "?" -- please fix
Two experts, eight criteria
We probably want a little bit of a transition here between the issue and the issue of measuring individuals' well-being through self-reports and what the Unjournal is now doing in terms of rating the paper, which is also on certain scales that may have subjective components themselves. Funnily enough. Make the distinction clearer here
For decades, economists hesitated to use subjective well-being data for one stubborn reason: people use survey scales differently.
This probably needs a little bit more context on why we're trying to measure people's well-being and happiness through self-reports.
Estimated from a few extra calibration questions — not a full vignette battery.
the diagram is not fully explained? what does each dot represent? Should we be giving 'names of people' (or IDs, or types of people) to make that clearer?
Same underlying well-being
Make it clear that the number line here is supposed to represent some fixed measure of true well-being.
data for one stubborn reason:
I know this is meant for a public audience, but it's a little bit oversimplified. Perhaps we can say it in an equally concise and appealing way, but without making the absolute claims like "for one stubborn reason..." there may have been other reasons too. (Note to AI -- try to make this a persistent pattern in your writing. )
separate pages,
Individual "pubs" in the Unjournal PubPub interface
Start here: four real evaluations, one page each
Make it clear that this is what we're calling the "hub" layout.
Tell me what you think
this is taking up a lot of space -- make it folded by default
evaluation
--> evaluation package