954 Matching Annotations

Last 7 days
forum.effectivealtruism.org forum.effectivealtruism.org

Forecasts estimate limited cultured meat production through 2050

1
1. daaronr 16 Apr 2025
  
  in Public
  
  Here we present an initial set of forecasts from a panel of paid forecasters (including Linch and Neil). We plan to expand forecasting on similar questions in a Metaculus tournament[1] so that we can see how forecasts are affected by news of supposedly important breakthroughs.
  
  A small team. This aspect might be seen as informal
Visit annotations in context

Annotators

daaronr

URL

forum.effectivealtruism.org/posts/2b9HCjTiFnWM8jkRM/forecasts-estimate-limited-cultured-meat-production-through
Mar 2025
www.openphilanthropy.org www.openphilanthropy.org

The Center for Election Science — General Support (2019) | Open Philanthropy

1
1. daaronr 13 Mar 2025
  
  in Public
  
  These systems provide no structural advantages or disadvantages to either the Democratic or Republican parties or to any single politician.
  
  This seems like it's probably an overstatement; may be shorthand here
Visit annotations in context

Annotators

daaronr

URL

openphilanthropy.org/grants/the-center-for-election-science-general-support-2019/
Nov 2024
results2021.ref.ac.uk results2021.ref.ac.uk

Impact case study : Results and submissions : REF 2021

1
1. daaronr 21 Nov 2024
  
  in Public
  
  Dr McCulloch has provided expert input, based on his research, to provide a solid evidence base to the Better Deal for Animals campaign coordinated by over 40 leading NGOs. McCulloch’s proposed animal welfare impact assessment has been supported by HMG Official Opposition Labour Party in Parliament and the UK Government has stated in official correspondence that it is considering how to implement post-Brexit sentience policy in response to coordinated lobbying strongly underpinned by McCulloch’s research.
  
  REF impact ... frameworks for The Unjournal?
Visit annotations in context

Annotators

daaronr

URL

results2021.ref.ac.uk/impact/10afeb5d-ac48-4c17-b7fb-7c032112b7a0
Oct 2024
www.cos.io www.cos.io

Lifecycle Journal | Center for Open Science

2
1. daaronr 23 Oct 2024
  
  in Public
  
  empirical evaluations
  
  What's 'empirical evaluations' here?
2. daaronr 23 Oct 2024
  
  in Public
  
  Publish-then-Review; Transparent and Open Peer Review
  
  The Unjournal supports this. I think we have done this in more depth and rigor than other initiatives in economics and social science
Visit annotations in context

Annotators

daaronr

URL

cos.io/lifecyclejournal
Aug 2024
unjournal.github.io unjournal.github.io

A first look at Unjournal’s data – The Unjournal Data Blog

16
1. daaronr 28 Aug 2024
  
  in Public
  
  Table 3 shows loadings for the first three components.
  
  We should discuss and analyze this further. What's a reasonable interpretation? How much of the variation do each of these explain?
2. daaronr 28 Aug 2024
  
  in Public
  
  The table above already looks a bit worrying: typically no more than half of our evaluators’ midpoints fall within their co-evaluator’s 90% credible interval
  
  I'd like to dig into this to find some specific instances. Are these all the 'real CIs' and not the ones we imputed from the earlier likert CI data?
3. daaronr 28 Aug 2024
  
  in Public
  
  Krippendorff's Alpha
  
  Some comparison numbers would also be helpful. I asked chatgpt
  
  give some comparison alpha's from other contexts to help understand this measure
  
  Response:
  
  Krippendorff's alpha is used across a variety of disciplines to measure inter-rater reliability. Below are some examples from different contexts to provide a point of reference for interpreting the values in your data:
  
  Content Analysis (Media Studies):
  
  A typical Krippendorff's alpha for coding reliability in content analysis (e.g., categorizing themes in media) might range from 0.6 to 0.8. Values below 0.67 are often considered insufficient for drawing strong conclusions, though thresholds can vary depending on the field. For example, a study evaluating news articles might report an alpha around 0.7.
  
  Healthcare (Medical Diagnostics):
  
  In medical diagnostics, particularly in assessments involving subjective judgment (e.g., radiology, pathology), alpha values are expected to be relatively high—often above 0.75. An alpha of 0.8 or higher is often considered acceptable for diagnostic consistency. For instance, a study on agreement among radiologists in interpreting mammograms might report an alpha of around 0.81.
  
  Education (Grading & Evaluation):
  
  In educational assessment, inter-rater reliability for grading essays or open-ended responses typically yields alphas in the range of 0.6 to 0.9, depending on the specificity of the rubric and training provided to raters. A study on grading consistency among multiple teachers might report an alpha of 0.65 to 0.75.
  
  Psychology (Behavioral Coding):
  
  In psychology, for tasks like coding behavior or assigning labels based on psychological interviews, Krippendorff’s alpha values of 0.4 to 0.6 are relatively common, but higher values (above 0.7) are preferred for more reliable and replicable findings. For example, coding nonverbal behaviors in therapy sessions might produce an alpha of 0.5.
  
  Social Science (Survey Coding):
  
  When coding open-ended survey responses, social science researchers often see alpha values ranging from 0.6 to 0.7, reflecting moderate agreement. An alpha of 0.65 might be reported in a study on the reliability of coding responses to survey questions about political preferences.
  
  Linguistics (Speech Annotation):
  
  In linguistics, particularly in tasks such as speech transcription or annotation, inter-rater reliability can vary widely depending on the complexity of the task. Alphas around 0.7 are often considered acceptable. For example, a study transcribing dialectical variations in speech might report an alpha of 0.72.
  
  General Guidelines for Interpreting Krippendorff's Alpha:
  
  >0.80: Strong agreement, generally considered acceptable for high-stakes decision-making.
  
  0.67 - 0.80: Moderate to strong agreement, often considered sufficient for exploratory studies or content analysis.
  
  0.50 - 0.67: Moderate agreement, acceptable in some fields, but may require more consistency for high-stakes or formal evaluations.
  
  <0.50: Low agreement, indicates significant variability in ratings; further training or clarification of criteria may be needed.
  
  Comparison to Your Data:
  
  Your highest alpha values (e.g., 0.804 for "gp_relevance" and 0.778 for "merits_journal") indicate strong agreement, similar to the reliability found in high-stakes fields like medical diagnostics or well-defined content analysis tasks.
  
  The lower values (e.g., 0.271 for "adv_knowledge" and 0.134 for "methods") suggest substantial disagreement among evaluators. These values are below what would be considered acceptable in most research contexts, even for exploratory analysis, and might require addressing the clarity or consistency of evaluation criteria.
  
  These comparisons provide a framework for assessing where your data stands in relation to typical expectations in various research areas.
4. daaronr 28 Aug 2024
  
  in Public
  
  pe=1/6. Krippendorff’s alpha is a widely accepted statistic that corrects for pe and also defines “agreement” appropriately for different levels of measurement.
  
  I think you are missing the basic characterization here ...
  
  1 indicates perfect agreement, 0 indicates no agreement beyond chance, and negative values indicate systematic disagreement.
5. daaronr 28 Aug 2024
  
  in Public
  
  Krippendorf's alpha statistics for our quantitative measures. N = 21 papers, 39 evaluations.
  
  Do we have an interpretation of this? Are these high, low, reasonable?
  
  (By the way if you wanted to integrate this into a permanent dashboard you might not want to add a narrative about the actual values, but you could still add a general discussion of 'what is considered a high alpha')
6. daaronr 28 Aug 2024
  
  in Public
  
  There is a single paper with three evaluations; adding this in would give us many missing values in the “third evaluation” column, and we’d have to use more advanced techniques to deal with these.
  
  We should find some way to integrate this in,. There's so little data it's a shame to drop these.
7. daaronr 28 Aug 2024
  
  in Public
  
  we have many evaluators, each contributing only one or two evaluations.
  
  Very rarely 2. I'm guessing it's only about 10% repeat evaluators atm.
8. daaronr 28 Aug 2024
  
  in Public
  
  So we need to adjust for the expected amount of agreement. To do this most measures use the marginal distributions of the ratings: in our example, a 1 in 6 chance of each number from 0 to 5, giving
  
  This seems like a wild oversimplification. This would be if they both gave uniformly distributed random ratings.
  
  A more reasonable baseline for 'meaningless ratings' would be something like "they both draw from give the underlying distribution of all papers, without using any information about the paper being rated itself." Perhaps some of the other statistics you mention get at this?
9. daaronr 28 Aug 2024
  
  in Public
  
  Choosing a reliability statistic
  
  Is there any way to make these boxes foldable in the blog post format?
10. daaronr 28 Aug 2024
  
  in Public
  
  Out of 342 confidence intervals, 0 were degenerate, 0 were uninformative and 7 were misspecified.
  
  We should look at this more closely.
  
  Which were misspecified? The newer interface actually doesn't permit this.
  
  As I noted above, some people simply refused to give CIs at all ... which is essentially giving a 'degenerate' interval
  
  For journal ranking tiers I'd still like to see the results in some way. If evaluators understood how we intended this measure to work, they should basically never give degenerate intervals.
11. daaronr 28 Aug 2024
  
  in Public
  
  We also check if people straightline lower bounds of the credible intervals (0 straightliners) and upper bounds (0 straightliners).
  
  However, IIRC some people didn't give bounds on the CIs at all, or they gave meaningless bounds; maybe we dropped those bounds?
12. daaronr 28 Aug 2024
  
  in Public
  
  Here are some things we might hope to learn from our data.
  
  You left out one thing I had suggested: Do evaluators with different attributes (country, field, anonymity choice) rate the papers differently? Again, this need not imply biases, but it might be suggestive.
13. daaronr 28 Aug 2024
  
  in Public
  
  Do evaluators understand the questions in the same way? Are different evaluators of the same paper answering the “same questions” in their head? What about evaluators of different papers in different fields?
  
  Should we reference 'construal validity' here?
14. daaronr 28 Aug 2024
  
  in Public
  
  About the data
  
  Can we give some links here to data and code? Even a setting to 'turn on the code' or some such? I like to do this for transparency and other reasons we've discussed.
  
  OK we do link the github repo -- maybe add a note about that, and about how to find the code that produces this blog for people who want to see/check?
15. daaronr 16 Aug 2024
  
  in Public
  
  Warning: Duplicate rows found in ratings data.
  
  Do we want this to show here? Maybe some note to the reader is in order?
16. daaronr 16 Aug 2024
  
  in Public
  
  Overall assessment: “Judge the quality of the research heuristically. Consider all aspects of quality, credibility, importance to knowledge production, and importance to practice.” Advancing our knowledge and practice: “To what extent does the project contribute to the field or to practice, particularly in ways that are relevant to global priorities and impactful interventions?…” Methods: Justification, reasonableness, validity, robustness: “Are the methods used well-justified and explained; are they a reasonable approach to answering the question(s) in this context? Are the underlying assumptions reasonable? Are the results and methods likely to be robust to reasonable changes in the underlying assumptions?…
  
  Formatting here: text is a bit bunched up, a bit harder to read
Visit annotations in context

Annotators

daaronr

URL

unjournal.github.io/unjournaldata/posts/uj-data-first-look/
unjournal.github.io unjournal.github.io

The Unjournal data blog – The Unjournal Data Blog

1
1. daaronr 28 Aug 2024
  
  in Public
  
  Blog Posts
  
  I think the 'blog post' might be a bit buried here? Perhaps it should be made more prominent?
Visit annotations in context

Annotators

daaronr

URL

unjournal.github.io/unjournaldata/index.html
unjournal.pubpub.org unjournal.pubpub.org

The Unjournal

2
1. daaronr 12 Aug 2024
  
  in Public
  
  The UnjournalJournal-independent peer review of research that informs global prioritiesUnjournal on PubPubSearch for contentLog in/sign upUJ homepageCompleted UJ evaluation packages
  
  I thing the spacing and fonts might be improved? Too much vertical space perhaps
2. daaronr 12 Aug 2024
  
  in Public
  
  The UnjournalThe Unjournal seeks to make rigorous research more impactful and impactful research more rigorous. We encourage better research by making it easier for researchers to get feedback and credible ratings. We focus on quantitative work that informs global priorities, especially in economics, policy, and social science. We commission public journal-independent evaluation of hosted papers and dynamically presented projects. We publish evaluations, ratings, manager summaries, author responses, and links to evaluated research on this PubPub – to inform practitioners and other researchers.As the name suggests, we are not a journal. We don’t “publish”, i.e., we don’t claim ownership or host research. We don’t charge any fees. Instead, we offer an efficient, informative, useful, and transparent research evaluation system. You can visit Unjournal.org and our knowledge base for more details or contact us at contact@unjournal.org.
  
  Probably worth making this text punchier
Visit annotations in context

Annotators

daaronr

URL

unjournal.pubpub.org/
Jul 2024
80000hours.org 80000hours.org

Research questions that could have a big social impact, organised by discipline

14
1. daaronr 16 Jul 2024
  
  in Public
  
  What’s the best way to measure individual wellbeing? What’s the best way to measure aggregate wellbeing for groups?
  
  Unjournal 'pivotal questions'
  
  By our above standards, this is far too broadly defined; however, there is a wealth of recent (empirical, theoretical, and methodological) work on this; some of which (e.g., ‘WELLBYs’) seems to be influencing the funding, policies and agendas at high-impact orgs. Elicit.org (free version) summarizes the findings from the ‘top-4’ recent papers:
  
  Recent research highlights the complexity of measuring wellbeing, emphasizing its multidimensional nature. Studies have identified inconsistencies in defining and measuring wellbeing among university students in the UK, with a focus on subjective experiences and mental health (Dodd et al., 2021). Global trends analysis reveals distinct patterns in flourishing across geography, time, and age, suggesting the need for comprehensive measures beyond single-item assessments (Shiba et al., 2022). Scholars argue for well-being indicators that capture physical, social, and mental conditions, as well as access to opportunities and resources (Lijadi, 2023). The concept of "Well-Being Capital" proposes integrating multidimensional wellbeing indicators with economic measures to better reflect a country's performance and inform public policy (Bayraktar, 2022). These studies collectively emphasize the importance of considering subjective experiences, cultural factors, and ecological embeddedness when measuring individual and aggregate wellbeing, moving beyond traditional economic indicators like GDP.
  
  This suggests a large set of more targeted questions, including conceptual questions, psychometric issues, normative economic theory, and empirical questions. But I also suspect that 80k and their funders would want to reframe this question in a more targetted way. They may be particularly interested in comparing a specific set of measures that they could actually source and use for making their decisions. They may be more focused on well-being measures that have been calibrated for individuals in extreme poverty, or suffering from painful diseases. They may only be interested in a small subset of theoretical concerns; perhaps only those that could be adapted to a cost-benefit framework.
  
  ^[Asking it to “...please focus on recent work in economics and decision theory, and measurements that have been used for low and middle-income countries.” yields [excerpted]
  
  Recent research … aim[s] to move beyond traditional economic indicators like GDP. The Global Index of Wellbeing (GLOWING) proposes a simple, meaningful measure using secondary ecological data (Elliott et al., 2017). The Multidimensional Wellbeing Index for Peru (MWI-P) incorporates 12 dimensions based on what people value, revealing disparities among subgroups (Clausen & Barrantes, 2022). Another approach suggests using the Condorcet median to aggregate non-comparable wellbeing facets into a robust ranking (Boccard, 2017). New methods for measuring welfare include stated preferences over aspects of wellbeing, life-satisfaction scales, and the WELLBY approach, as well as comprehensive frameworks like Bhutan's Gross National Happiness Index (Cooper et al., 2023).]
  
  This seems promising as the basis for a ‘research prioritization stream’. We would want to build a specific set of representative questions and applications as well as some counterexamples (‘questions we are less interested in’), and then we could make a specific drive to source and evaluate work in this area.
2. daaronr 15 Jul 2024
  
  in Public
  
  When estimating the chance that now (or any given time) is a particularly pivotal moment in history, what is the best uninformative prior to update from? For example, see our podcast with Will MacAskill and this thread between Will MacAskill and Toby Ord for a discussion of the relative merits of using a uniform prior v. a Jeffreys prior.
  
  Unjournal 'pivotal questions' -- not quite operationalized but seems close to workable 5/10
3. daaronr 15 Jul 2024
  
  in Public
  
  How frequent are supervolcanic eruptions and what size of eruption could cause a volcanic winter scenario? (Adapted from Toby Ord, The Precipice, Appendix F)
  
  Unjournal 'pivotal questions' -- operationalizable but out of our current scope.
4. daaronr 15 Jul 2024
  
  in Public
  
  What’s the average lifespan of the most common species of wild animals? What percent die via various means
  
  Unjournal 'pivotal questions' -- operationalizable but out of our current scope.
5. daaronr 15 Jul 2024
  
  in Public
  
  What’s the minimum viable human population (from the perspective of genetic diversity)? (Michael Aird, Crucial questions for longtermists)
  
  Unjournal 'pivotal questions' -- operationalizable but out of our current scope.
6. daaronr 15 Jul 2024
  
  in Public
  
  What are the best existing methods for estimating the long-term benefit of past investments in scientific research, and what have they found? What new estimates should be conducted? (Adapted from Luke Muehlhauser (writing for Open Philanthropy), Technical and Philosophical Questions That Might Affect our Grantmaking)
  
  Unjournal 'pivotal questions'
  
  Reframe as ‘what has been the long term benefit {defined in terms of some specific measure} of investment in scientific research’ – 6/10 as reframed
7. daaronr 15 Jul 2024
  
  in Public
  
  E.g. does creatine actually increase IQ in vegetarians?
  
  [The “e.g.”, is 9/10 although it’s on the borderline of our scope]
8. daaronr 15 Jul 2024
  
  in Public
  
  E.g. does creatine actually increase IQ in vegetarians?
  
  Unjournal 'pivotal questions'
  
  The “e.g.”, is 9/10 although it’s on the borderline of our scope
9. daaronr 15 Jul 2024
  
  in Public
  
  How well does good forecasting ability transfer across domains?
  
  Unjournal 'pivotal questions'
  
  Could be operationalized ... what measure of ability, which domains (or how to define this?)
  
  4/10
10. daaronr 15 Jul 2024
  
  in Public
  
  Will AI come to be seen as the one of the most strategically important parts of the modern economy, warranting massive state support and intervention?
  
  Unjournal 'pivotal questions'
  
  Could this be operationalized? 3/10
11. daaronr 15 Jul 2024
  
  in Public
  
  Could advances in AI lead to risks of very bad outcomes, like suffering on a massive scale? Is it the most likely source of such risks? (Adapted from Michael Aird, Crucial questions for longtermists)
  
  Unjournal 'pivotal questions' ‘Could’ is vague. ‘Very bad outcomes’ needs a precise measure. Reframe as ~‘what is the increase in the risk of {specific VBO} as a function of the level of AI progress {specifically defined}.
  
  3/10
12. daaronr 15 Jul 2024
  
  in Public
  
  How much do global issues differ in how cost-effective the most cost-effective interventions within them are?
  
  Unjournal 'pivotal questions'
  
  ‘Global issues’ is vague; we need more specific categorizations. Cost-effectiveness needs a metric –
  
  5/10
13. daaronr 15 Jul 2024
  
  in Public
  
  long-term rate of expropriation of financial investments? How does this vary as investments grow larger? (Michael Aird, Crucial questions for longtermists)
  
  Unjournal 'pivotal questions'
  
  This is fairly definite, although it’s not clear what the precise motivation is here.
  
  8/10
14. daaronr 15 Jul 2024
  
  in Public
  
  What is the effect of economic growth on existential risk? How desirable is economic growth after accounting for this and any other side effects that might be important from a longtermist perspective? (See a recent paper by Leopold Aschenbrenner for some initial work on this question.)
  
  Unjournal 'pivotal questions' -- Needs some refinement – what measure of growth, what units of x-risk, etc.
  
  6/10
Visit annotations in context

Annotators

daaronr

URL

80000hours.org/articles/research-questions-by-discipline/
May 2024
www.givewell.org www.givewell.org

New Incentives (Conditional Cash Transfers to Increase Infant Vaccination)

5
1. daaronr 07 May 2024
  
  in Public
  
  Note: The figures in this report are from our February 2024 cost-effectiveness analysis. Our estimates change over time as we gather new information and update our analysis, and so the numbers in this report may not exactly match those in our most recent cost-effectiveness analysis (available here).
  
  could they make these reports dynamic ?
2. daaronr 07 May 2024
  
  in Public
  
  We adjust for this risk in our analysis (reducing our estimate of vaccine efficacy by around 20%).
  
  These ad-hoc adjustments seem particular sketchy and this ripe for critique and a more systematic approach
3. daaronr 07 May 2024
  
  in Public
  
  Proportion of children enrolled who would be vaccinated in the absence of New Incentives’ program (more)
  
  The hover content is helpful
4. daaronr 07 May 2024
  
  in Public
  
  We use a cost-effectiveness analysis to quantify our reasoning. Here is a summary of our analysis, using one state, Bauchi, as an example.
  
  The linked Google sheet is hard to parse and hard to read. This makes it less than fully transparent. E.g., the columns are frozen in a way that you can barely navigate the by-region columns.
  
  Linking something people can't use doesn't add transparency, it just wastes people's attention. If you feel the need put these links at the bottom, in a 'data section' or something. Anyone who wants to dig into it will need to do so as part of a separate and intensive exercise -- not just a glance while reading this. At least that's my impression.
  
  But also note that a code notebook based platform can be far more manageable for the reader.
5. daaronr 07 May 2024
  
  in Public
  
  New Incentives (Conditional Cash Transfers to Increase Infant Vaccination)
  
  I think all reports should have an 'update date' prominently listed. OK it says 'April' below
Visit annotations in context

Annotators

daaronr

URL

givewell.org/international/technical/programs/new-incentives
Apr 2024
www.cambridge.org www.cambridge.org

The Political Consequences of Green Policies: Evidence from Italy

28
1. daaronr 18 Apr 2024
  
  in Public
  
  In both cases the estimated effects are very small and not statistically distinguishable from zero
  
  But can they actively rule out a substantial effect ... equivalence test. I see some positive effects. Maybe it is just underpowered.
2. daaronr 18 Apr 2024
  
  in Public
  
  we focus on pre-trends
  
  Is this the usual pre-trend diagnostic for DiD?
3. daaronr 18 Apr 2024
  
  in Public
  
  Vote Switching Before Area B
  
  I want to see the voting levels by group
4. daaronr 18 Apr 2024
  
  in Public
  
  Hence, in this case, Diesel-Euro5 car owners constitute a “fake” treatment group
  
  robustness tests -- what are some other 'fake' control groups?
5. daaronr 18 Apr 2024
  
  in Public
  
  The null result on the Democratic Party is particularly interesting, as it suggests that voters did not penalize the party of the incumbent mayor, who was directly accountable for the introduction of the traffic ban.
  
  But I suspect they might have reported this result as supporting their hypothesis. Thus a multiple comparison test seems appropriate.
6. daaronr 18 Apr 2024
  
  in Public
  
  27 Our findings are robust to restricting the control group to Diesel-Euro5 owners only, and to controlling for the number of kilometers driven per year, as well as for the frequency of car use (see Supplementary Tables SI-8–SI-10, respectively).
  
  they did some reasonable robustness checks. What about restricting it to all Diesel cars?
7. daaronr 18 Apr 2024
  
  in Public
  
  substantively sizable shift considering that the baseline rate of support for Lega in the sample was 24.4%. Put differently, owning a car affected by the vehicle ban raised the probability of voting for Lega in the subsequent elections by 55% ab
  
  I'm surprised it's so large ... hmmm
8. daaronr 18 Apr 2024
  
  in Public
  
  Table 2. Voting for Lega in EU Elections of 2019
  
  a key first table of results. Looks like a linear probability model, here.
9. daaronr 18 Apr 2024
  
  in Public
  
  Our main dependent variable is an indicator that takes the value 1 if the respondent reports voting for Lega, and 0 otherwise. We also investigate potential treatment effects on support for other parties. In particular, to assess potential anti-incumbent effects, we examine support for the Democratic Party, the party of the city’s mayor.
  
  That would be other plausible outcomes?
  
  Any good way to correct for multiple comparisons here?
10. daaronr 18 Apr 2024
  
  in Public
  
  we do not know the exact timing of the disbursement, that is, whether the compensation was actually received by the time the EU elections took place.Footnote 25 In other words, our indicator variable equals 1 if the respondent has received compensation at any point in time prior to the survey (January 2021), and zero otherwise
  
  so it should understate the effecgt
11. daaronr 18 Apr 2024
  
  in Public
  
  eassuringly, earlier studies (e.g., Colantone and Stanig Reference Colantone and Stanig2018) show that individuals with these characteristics tend to be less likely to support a radical-right party such as Lega. Hence, the composition of the treatment group should in fact work against finding a pro-Lega effect of the policy.Footnote
  
  Hmm... Note that the treated group is also more male. But I don't think any of this matters after you control for these demographics, so their comment is misguided.
  
  OK this is probably something we can ignore.
12. daaronr 18 Apr 2024
  
  in Public
  
  Table 1 compares the characteristics of the different groups of car owners in terms of their age, gender, education, and income.
  
  to do: I would want to see a table including vote shares here
13. daaronr 18 Apr 2024
  
  in Public
  
  specifications of the following form:(1)
  
  Leading robustness check suggestion: The key outcome variable is a percentage of the vote total, I guess. But the treated and groups may have started from a different baseline level of voting. If the true model is nonlinear, the results here may be misleading. E.g., suppose the true effect on voting was the same for both groups as a percentage of the initial vote. Or suppose the impacts have diminishing returns, a sort of ceiling effect.
  
  Other possible robustness checks ... what are some other plausible forms? Car type random effects? Limit analysis to diesel only? Euro-5 only?
  
  Use demographic controls differently (?interact with a random effect or something?) Note, this DiD does not involve any time difference.
14. daaronr 18 Apr 2024
  
  in Public
  
  As another type of control group, we also interviewed 303 owners of new cars in the Euro6 category (both Diesel and Petrol). These car owners serve as a useful placebo test, for reasons we detail below.
  
  the 'placebo test' -- Possible robustness test: what if we made them part of the control group?
15. daaronr 18 Apr 2024
  
  in Public
  
  Starting in April 2019, city residents affected by the ban could apply for compensation from the Municipality of Milan.Footnote 12 The initial 2019 call for compensation was open only to low-income car owners (i.e., with an adjusted household income below €25,000 per year, or €28,000 if aged 65+). In the next year, the income criterion was dropped, and hence the call was effectively open to all residents
  
  compensation scheme
16. daaronr 18 Apr 2024
  
  in Public
  
  aim is to compare owners of affected cars to owners of relatively similar-yet-unaffected cars.Footnote 10 Specifically, our treatment group will consist of owners of Diesel-Euro4 cars, while the control group will consist of Petrol-Euro4, Diesel-Euro5, and Petrol-Euro5 car owners.Footnote
  
  Possible robustness test: compare other plausible vehicle treatment and control groups
17. daaronr 18 Apr 2024
  
  in Public
  
  The policy identifies the most polluting categories of vehicles and bans them from accessing and circulating within the area.
  
  the policy
18. daaronr 18 Apr 2024
  
  in Public
  
  seem to
  
  "seem to" is vague
19. daaronr 18 Apr 2024
  
  in Public
  
  If anything, affected car owners exhibited slightly more environment-friendly attitudes.
  
  I guess they mean 'after the policy these people became relatively more environmentally friendly'
20. daaronr 18 Apr 2024
  
  in Public
  
  was even larger
  
  Check: significantly larger?
21. daaronr 18 Apr 2024
  
  in Public
  
  owners of banned cars were 13.5 percentage points more likely to vote for Lega in the European Parliament elections of 2019
  
  main claim, quantified
22. daaronr 18 Apr 2024
  
  in Public
  
  close to zero and well below statistical significance.
  
  But how big a violation is 'substantial enough to matter' here?
23. daaronr 18 Apr 2024
  
  in Public
  
  owners of banned vehicles—who incurred a median loss of €3,750—were significantly more likely to vote for Lega in the subsequent elections
  
  In the paper, they are basically making the stronger claim that "this policy CAUSED these people to vote for Lega". That's the argument behind their discontinuity DiD approach .
24. daaronr 18 Apr 2024
  
  in Public
  
  we exploit arbitrary discontinuities in the rules dictating the car models that would be covered by the ban and employ a difference-in-differences estimation to identify the policy’s effect on voting behavior.
  
  The DiD approach
25. daaronr 18 Apr 2024
  
  in Public
  
  the electoral impact of the introduction of the Area B ban.
  
  the thing they want to measure
26. daaronr 18 Apr 2024
  
  in Public
  
  original survey with a targeted sampling design that we conducted among residents of Milan. The survey collected detailed information about respondents’ car ownership, environmental views, and political behavior.
  
  the data
27. daaronr 18 Apr 2024
  
  in Public
  
  In line with this pattern, recipients of compensation from the local government were not more likely to switch to Lega.
  
  A claim of a 'null finding' (bounded or just underpowered?) And is the difference significant?
28. daaronr 18 Apr 2024
  
  in Public
  
  indicates that this electoral change did not stem from a broader shift against environmentalism, but rather from disaffection with the policy’s uneven pocketbook implications.
  
  A secondary claim
Visit annotations in context

Annotators

daaronr

URL

cambridge.org/core/journals/american-political-science-review/article/political-consequences-of-green-policies-evidence-from-italy/4D76FEDA813739711DCB40EC102744AF
Mar 2024
static1.squarespace.com static1.squarespace.com

XPT.pdf

16
1. daaronr 07 Mar 2024
  
  in Public
  
  AI-concerned think the risk that a genetically engineered pathogen will killmore than 1% of people within a 5-year period before 2100 is 12.38%, while the AIskeptics forecast a 2% chance of that event, with 96% of the AI-concerned abovethe AI skeptics’ median forecast
  
  this seems like a sort of ad-hoc way of breaking up the data. What exactly is the question here, and why is this the best way to answer it?
2. daaronr 07 Mar 2024
  
  in Public
  
  hose who did best on reciprocal scoring had lower forecasts ofextinction risk.72 We separately compare each forecaster’s forecast of others’ forecasts on ten key questions, for both expertsand superforecasters. We rank each forecaster’s accuracy on those 20 quantities relative to other participants,and then we compute each forecaster’s average rank to calculate an overall measure of intersubjective accuracy.73 This may be because superforecasters are a more homogenous group, who regularly interact with eachother outside of forecasting tournaments like this.74 Pavel Atanasov et al., “Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters,” SSRN WorkingPaper, (February 14, 2023), http://dx.doi.org/10.2139/ssrn.4357367.
  
  This seems visually the case, but I don't see metrics or statistical inference here.
3. daaronr 07 Mar 2024
  
  in Public
  
  ithin both groups—experts and superforecasters—more accurate reciprocalscores were correlated with lower estimates of catastrophic and extinction risk. Inother words, the better experts were at discerning what other people would predict,the less concerned they were about extinction
  
  But couldn't this just be because people who think there is high Xrisk think others are likely to think like themselves? Is it more finely grained 'better reciprocal accuracy' than that?
4. daaronr 07 Mar 2024
  
  in Public
  
  otal Catastrophic Risk
  
  The differences in the total x-risk are not quite so striking-- about 2:1 vs 6:1 What accounts for this? Hmm, this look different from the 'Total Extinction risk' in table 4. Here a notebook would be helpful. Ahh, it's because this is for catastrophic risk, not extinction risk.
5. daaronr 07 Mar 2024
  
  in Public
  
  First, we can rule out the possibility that experts can’t persuade others of the severityof existential risks simply because of a complete lack of sophistication, motivation,or intelligence on the part of their audience. The superforecasters have all thosecharacteristics, and they continue to assign much lower chances than do experts.
  
  This paragraph seems a bit loosely argued.
6. daaronr 06 Mar 2024
  
  in Public
  
  Question and resolution details
  
  They seem to have displayed the questions along with particular “Prior Forecasts” — is that appropriate? Could that be driving the persistent difference between the superforecasters and experts?
7. daaronr 06 Mar 2024
  
  in Public
  
  general x-riskexperts
  
  What are 'general x-risk experts'? Give some examples.
8. daaronr 06 Mar 2024
  
  in Public
  
  The median participant who completedthe tournament earned $2,500 in incentives, but this figure is expected to rise asquestions resolve in the coming years.
  
  fairly substantial incentives ... but it may have been time consuming; how many hours did it take?... and how much variation was there in the incentive pay/how sensitive was it to the predictions?
9. daaronr 06 Mar 2024
  
  in Public
  
  with 111completing all stages of the tournament
  
  Would this attrition matter?
10. daaronr 06 Mar 2024
  
  in Public
  
  Participants made individual forecasts2. Teams comprised entirely of either superforecasters or experts deliberated andupdated their forecasts3. Blended teams from the second stage, consisting of one superforecaster team andone expert team, deliberated and updated their forecasts4. Each team saw one wiki summarizing the thinking of another team and againupdated their forecasts
  
  with incentives for accuracy (or 'intersubjective' accuracy) at each stage, or only at the very end? Aldo incentives for making strong comments and (?) convincing others/
11. daaronr 06 Mar 2024
  
  in Public
  
  We also advertised broadly, reaching participants withrelevant experience via blogs and Twitter. We received hundreds of expressions ofinterest in participating in the tournament, and we screened these respondents forexpertise, offering slots to respondents with the most expertise after a review of theirbackgrounds.1
  
  Recruitment of experts.
12. daaronr 06 Mar 2024
  
  in Public
  
  We explained that after the tournament we would show the highest-qualityanonymized rationales (curated by independent readers) to panels of online surveyparticipants who would make forecasts before and after reading the rationale. Prizesgo to those whose rationales helped citizens update their forecasts toward greateraccuracy, using both proper scoring rules for resolvable questions and intersubjectiveaccuracy for unresolvable questions.21
  
  Is this approach valid? Would it give powerful incentives to be persuasive? What is are these rationales used for? Note that 'intersubjective accuracy' is not a ground truth for the latter questions.
13. daaronr 06 Mar 2024
  
  in Public
  
  One common challenge in forecasting tournaments is to uncover the reasoningbehind predictions.
  
  How does this 'uncover the reasoning behind predictions'?
14. daaronr 06 Mar 2024
  
  in Public
  
  scoring ground rules: questions resolving by 2030were scored using traditional forecasting metrics where the goal was to minimize thegap between probability judgments and reality (coded as zero or one as a function ofthe outcome). However, for the longer-run questions, participants learned that theywould be scored based on the accuracy of their reciprocal forecasts: the better theypredicted what experts and superforecasters would predict for each question, thebetter their score.
  
  Is the 'reciprocal scoring' rule likely to motivate honest (incentive-compatible) predictions? Is it likely to generate useful information in this context?
15. daaronr 06 Mar 2024
  
  in Public
  
  When we report probabilities of long-run catastrophic andexistential risk in this report, we report forecasters’ own (unincentivized) beliefs. But,we rely on the incentivized forecasts to calculate measures of intersubjective accuracy
  
  This is a bit confusing. The language needs clarification. What exactly is 'intersubjective accuracy'?
16. daaronr 06 Mar 2024
  
  in Public
  
  the XPT:• What will be the global surface temperature change as compared to 1850–1900, indegrees Celsius? (By 2030, 2050, 2100)• By what year will fusion reactors deliver 1% of all utility-scale power consumed inthe U.S.?• How much will be spent on compute [computational resources] in the largest AIexperiment? (By 2024, 2030, 2050)• What is the probability that artificial intelligence will be the cause of death, within a5-year period, for more than 10% of humans alive at the beginning of that period?(By 2030, 2050, 2100)• What is the overall probability of human extinction or a reduction in the globalpopulation below 5,000? (By 2030, 2050, 2100)18 Participants also consented to participate in this study, via the University of Pennsylvania’s InstitutionalReview Board. The consent form detailed the format of the study.19 We define a catastrophic event as one causing the death of at least 10% of humans alive at the beginning ofa five-year period. We define extinction as reduction of the global population to less than 5,000.
  
  I appreciate these links to the full question content.
Visit annotations in context

Annotators

daaronr

URL

static1.squarespace.com/static/635693acf15a3e2a14a56a4a/t/64f0a7838ccbf43b6b5ee40c/1693493128111/XPT.pdf
Dec 2023
www.nber.org www.nber.org

Subjecting the ‘Average Joe’ to War Theatre Triggers Intimate Partner Violence

5
1. daaronr 27 Dec 2023
  
  in Public
  
  feed these motivations through several potential mechanism
  
  I don't see the connection to household bargaining here
2. daaronr 27 Dec 2023
  
  in Public
  
  conflict environments can also exacerbate existing gender-basedinequalities or challenge traditional gender roles in a society
  
  This seems logically incoherent. Why would conflict challenge gender roles? And if the opposite, why would greater inequality (favoring men, I guess) make them want to resort to violence to 'reassert' their power?
3. daaronr 27 Dec 2023
  
  in Public
  
  establishment of peace in the public space has substantial positive spillover effects in enhancingwomen’s well-being in the private space
  
  relevant to prioritization
4. daaronr 27 Dec 2023
  
  in Public
  
  Therefore, we identify the population average treatment effect (PATE) ofarmed combat exposure
  
  whole population or only 90%?
5. daaronr 27 Dec 2023
  
  in Public
  
  Probing the mechanisms, our analysis first renders the use of violence as an instrumental behavior in intrahousehold bargaining as an unlikely mechanism by eliminating labor market outcomes and economic- and social-controlling behaviors from the list of usual suspects.
  
  This sentence is confusing. And why would I expect that 'violence ... [for[ intrahousehold bargaining' would be particularly driven by having been assigned to a conflict zone?
Visit annotations in context

Annotators

daaronr

URL

nber.org/system/files/working_papers/w31227/w31227.pdf
hughjonesd.github.io hughjonesd.github.io

Untitled document

8
1. daaronr 16 Dec 2023
  
  in Public
  
  Usage
  
  I think you need to explain a possible workflow here. E.g.,
  
  Open the tex, markdown, etc. file in a text editor
  
  Apply the suggested suggestions using the above syntax and save it with a different name
  
  Use the 'suggs' tools below to manage this
  
  Or 1. Open the old.txt (tex, markdown, etc. file) in a text editor 2. Just make your suggested changes (and comments?) and save it as new.txt, don't use the syntax 3. Use the utility 'suggs diff' to make a third file that highlights these suggestions (what is the name of the new file? do I need to pipe it into something?)
  
  Also, would you want to have this syntax and style of suggs mapped for key text editors for syntax highlighting and maybe shortcut keys?
2. daaronr 16 Dec 2023
  
  in Public
  
  Create a suggestions file from the difference between old.txt and new.txt: suggs diff old.txt new.txt
  
  I'm curious how this will work -- who does it attribute the changes to?
3. daaronr 16 Dec 2023
  
  in Public
  
  To review suggestions:
  
  Can you signpost or tease whether this will be done manually or whether there will be suggested shortcut keys etc?
4. daaronr 16 Dec 2023
  
  in Public
  
  The original text, ++[your suggested addition,]++
  
  why two "+" signs and not just one?
5. daaronr 16 Dec 2023
  
  in Public
  
  The handle must start with @ and must be the last word:
  
  See above suggestion about how people grok the "@". I suggest --David or something like this instead... maybe even three dashes to avoid confusion with actual double-dash content.
6. daaronr 16 Dec 2023
  
  in Public
  
  You can sign the comment with a @handle as the last word.
  
  "@" always makes me think you are flagging the OTHER guy ... and you expect it alerts them somehow. Maybe a double dash instead?
  
  %%[ This clarifies the argument, right, @stephen? --Reinstein ]%%
7. daaronr 16 Dec 2023
  
  in Public
  
  To make a comment, enclose it in %%[ and [%%:
  
  Typo -- close bracket
8. daaronr 16 Dec 2023
  
  in Public
  
  ke this: The original text, ++[your suggested addition,]++ and more text.
  
  Formatting of this documentation file -- I can barely see the text in those boxes ... make the boxes taller
Visit annotations in context

Annotators

daaronr

URL

hughjonesd.github.io/suggestions/
static1.squarespace.com static1.squarespace.com

XPT.pdf

7
1. daaronr 04 Dec 2023
  
  in Public
  
  3. How the XPT works
  
  A web site/wiki thing with dynamic explanations seems better for this section
2. daaronr 04 Dec 2023
  
  in Public
  
  1.33% [0.17,
  
  tables should be forematted better
3. daaronr 04 Dec 2023
  
  in Public
  
  The median is straightforward tocalculate, transparent, robust to extreme outlying observations, and understandableto people with a basic knowledge of statistics. Also, reassuringly, it is never thehighest nor the lowest of the five methods we considered as potential aggregationmethods. For these reasons, we think the median provides an ideal middle ground foraggregating forecasts in this project.
  
  This seems very much ad-hoc and not meant for a specialist audience. There is a whole literature on this, and much more theoretically grounded approaches, as you know. The justification given here is rather incomplete.
4. daaronr 04 Dec 2023
  
  in Public
  
  otal Extinction Risk
  
  This stuff could be better presented as a dashboard/hosted Quarto type thing
5. daaronr 04 Dec 2023
  
  in Public
  
  bold claims that attract audiences and funding—and to keep their predictions vagueenough so they can never be proven wrong.
  
  this seems somewhat contradictory
6. daaronr 04 Dec 2023
  
  in Public
  
  Some have argued more broadl
  
  if this were a part of the project being evaluated we would ask for a reference here ('who are these people?'). But maybe OK for exec. summary.
7. daaronr 04 Dec 2023
  
  in Public
  
  I"m not sure a pdf is the best format for this. I suspect more interactive web presentation would be better
Visit annotations in context

Annotators

daaronr

URL

static1.squarespace.com/static/635693acf15a3e2a14a56a4a/t/64f0a7838ccbf43b6b5ee40c/1693493128111/XPT.pdf
Nov 2023
globalprioritiesinstitute.org globalprioritiesinstitute.org

Economics-research-agenda-draft.pdf

32
1. daaronr 22 Nov 2023
  
  in Public
  
  Do ‘broad’ approaches to improving effective governance, and ultimately serving the farfuture, tend to be more or less effective in expectation than ‘narrow’ approaches (such asworking on reducing the risk of bioengineered pandemics)?
  
  A very big question -- would be helpful to pose some possible building blocks to answering this question that gives people a hint at how to take a stab at it.
2. daaronr 22 Nov 2023
  
  in Public
  
  How can evidence be disseminated mosteffectively?
  
  Disseminate: By whom, to whom, with what theory of change/path to impact?
3. daaronr 22 Nov 2023
  
  in Public
  
  Under what conditions should a social planner preserve ‘option value’ bydelaying an important, irreversible decision to acquire more information, thereby delegatingdecision-making authority to future agents with potentially different values and preferences(cf. Bishop 1982; Dixit and Pindyck 1994)?
  
  To me this seems distinct from the rest of the bullet point
4. daaronr 22 Nov 2023
  
  in Public
  
  hen et al 2023; Toma andBell 2023
  
  biblio entries missing
5. daaronr 22 Nov 2023
  
  in Public
  
  Intergenerational governance and policy-making
  
  It's unclear whether we are talking about A. "intergenerational governance and international policymaking" or
  
  B. "1. International governance and 2. Policymaking in general".
  
  The latter bullet points and cited papers (e.g., Vivalt and Coville) do not seem to always relate to intergenerational governance
6. daaronr 22 Nov 2023
  
  in Public
  
  Vivalt, Coville, KC 2023
  
  This seems relevant for The Unjournal's consideration/evaluation (but this may fall into the 'ask authors' permission' category). It is empirical and apparently rigorously quantitative and seems highly-relevant to policymaking and impact evaluation research, and influencing policy, all crucial to 'the impact agenda'. Hopefully also follows open, robust science standards (prereg, etc.).
7. daaronr 22 Nov 2023
  
  in Public
  
  Gonzalez-Ricoy and Gosseries 2017
  
  biblio entry missing
8. daaronr 22 Nov 2023
  
  in Public
  
  Can we design mechanisms to ensure that AI systems exhibit desirable behavioursuch as truth-telling or a lack of deception?
  
  Perhaps this should be elaborated and defined somewhat more formally. Reference a particular issue in mechanism design here that is particularly relevant to AI systems, perhaps.
9. daaronr 22 Nov 2023
  
  in Public
  
  govern powerful non-state entities
  
  I'm not sure what is meant by 'powerful non-state entities' here. This seems under-defined.
10. daaronr 22 Nov 2023
  
  in Public
  
  inequalities,
  
  political inequalities?
11. daaronr 22 Nov 2023
  
  in Public
  
  Read and Toma, 2023
  
  biblio entry missing. Would be useful to know what this one is.
12. daaronr 22 Nov 2023
  
  in Public
  
  Song et al 2012)
  
  biblio missing
13. daaronr 22 Nov 2023
  
  in Public
  
  Healy and Malhorta 2009
  
  sp: Malhotra https://www.cambridge.org/core/journals/american-political-science-review/article/abs/myopic-voters-and-natural-disaster-policy/039708A3223EC114365ADF56F1D26423
14. daaronr 22 Nov 2023
  
  in Public
  
  economic models predict the impact of advancedAI systems on political institutions and inequalities
  
  A reference would be very helpful here. It's hard for me to see what sort of economic models are relevant here.
15. daaronr 22 Nov 2023
  
  in Public
  
  (Acemoglu, 2023
  
  biblio missing. This seem potentially relevant for an Unjournal evaluation, although we tend not to focus on 'broad think piece' work, which this might be
16. daaronr 22 Nov 2023
  
  in Public
  
  Bersiroglu andErdil, 2023
  
  biblio entry missing
17. daaronr 22 Nov 2023
  
  in Public
  
  Shulman, C., & Thornley, E. (2023). How Much Should Governments Pay to Prevent Catastrophes? Longtermism'sLimited Role. In Barratt, Greaves, Thorstad (eds.) Essays on Longtermism.
  
  interesting but probably not quantitative/formally specific enough for The Unjournal
18. daaronr 22 Nov 2023
  
  in Public
  
  Alexandrie and Eden, forthcoming
  
  biblio missing
19. daaronr 22 Nov 2023
  
  in Public
  
  E.g. Jordà et al. 2022
  
  not sure this is getting at 'long run' in the sense that longtermists care about
20. daaronr 22 Nov 2023
  
  in Public
  
  ased on the historical record of such events, what is the tail distributionof harmful impacts (e.g., fatalities) from pandemics, asteroids, wars, and other potentialdisasters? (E.g. Marani 2021;
  
  not really economics but that's not so important
21. daaronr 22 Nov 2023
  
  in Public
  
  Aschenbrenner2019
  
  biblio entry missing
22. daaronr 22 Nov 2023
  
  in Public
  
  2023
  
  This is mainly about the welfare tradeoff between economic growth and x-risk in a theoretical sense; I don't think it's about the 'impact of growth on GCRs' per se
23. daaronr 22 Nov 2023
  
  in Public
  
  Klenow et al. 2023
  
  biblio entry missing
24. daaronr 22 Nov 2023
  
  in Public
  
  To what extent are forecasting methods informative for assessing the probability of globalcatastrophic risks and other future events of special importance for social welfare? (Karger
  
  empirical and seems very relevant and strong; adding it to the Unjournal database
25. daaronr 22 Nov 2023
  
  in Public
  
  Karger, E., Rosenberg, J., Jacobs, Z., Hickman, M., Hadshar, R., Gamin, K., ... & Tetlock, P. E. (2023). ForecastingExistential Risks Evidence from a Long-Run Forecasting Tournament. FRI Working Paper No. 1.
  
  https://forecastingresearch.org/s/XPT.pdf
  
  empirical
26. daaronr 21 Nov 2023
  
  in Public
  
  Kalai & Kalai 2001)
  
  biblio entry missing
27. daaronr 21 Nov 2023
  
  in Public
  
  Andreoni 2018
  
  this is a practical applied policy paper that seems informative for donors considering their own charity decisionmaking
  
  charitable giving
28. daaronr 21 Nov 2023
  
  in Public
  
  1.1 Strategic issues in altruistic decision
  
  they largely mention theory papers (micro theory, optimization, axiomatic/normative), not empirical work here
29. daaronr 21 Nov 2023
  
  in Public
  
  (cf.Andreoni & Payne 2003)
  
  The Andreoni and Payne paper is about the government crowdout of private philanthropy (there are a bunch of papers about this), not about the reverse nor about crowding out among donors.
  
  charitable giving
30. daaronr 21 Nov 2023
  
  in Public
  
  Whatdetermines the optimal spending schedule for altruistic decision-makers?
  
  Practically speaking, this seems largely about the impact of interventions (funded by charity) over time; however it does connect with donors' to the extent it involves personal finance and issues like value drift.
  
  OK but the Trammell paper is addressing something different -- coordination in a public goods provision model.
31. daaronr 21 Nov 2023
  
  in Public
  
  n altruistic decision-maker that funds a charitable intervention may crowd out fundingfrom other actors (e.g., governments or philanthropists)
  
  I might add a related issue -- decisions to give to one charity may crowd out other donations; the extent to which this is the case ('donations are substitutes') informs strategies for convincing people to give 'more effectively' vs 'give to effective causes.' See my notes: https://daaronr.github.io/ea_giving_barriers/chapters/substitution.html
  
  charitable giving
32. daaronr 21 Nov 2023
  
  in Public
  
  Research agenda draft for GPI Economics
  
  Does anyone know if this is the most updated statement of GPI’s economics agenda?
  
  “Economics ‘draft agenda’” Anyone know when it was updated?
Visit annotations in context

Tags

charitable giving

Annotators

daaronr

URL

globalprioritiesinstitute.org/wp-content/uploads/Economics-research-agenda-draft.pdf
Oct 2023
github.com github.com

ben-aaron188/rgpt3: Making requests from R to ChatGPT and the GPT-3 API

1
1. daaronr 16 Oct 2023
  
  in Public
  
  single chat requests: chatgpt_single()
  
  chatgpt_single(prompt_role = 'user', prompt_content = 'say something relevant' , temperature = 0.8 , n = 2 , max_tokens = 15)
  
  n = 2 -- 2 responses max_tokens = 15 ... maybe it means 'try 15 times' to get the best 2 from
  
  prompt_role = 'user' -- just ask a question for the gpt ('assistant') to respond. I'm not sure why it would make sense for single chat requests to choose the role 'system' or 'assistant' as this wouldn't persist (?)
Visit annotations in context

Annotators

daaronr

URL

github.com/ben-aaron188/rgpt3
deliverypdf.ssrn.com deliverypdf.ssrn.com

Untitled document

4
1. daaronr 06 Oct 2023
  
  in Public
  
  ecause military shocks generate plausibly exogenous variationin economic production, the findings improve upon the correlational evidence in Section 3.Second, the implied biodiversity-GDP elasticities from the quasi-experiment are largerthan their OLS counterparts. By leveraging shocks in a 2SLS setting, the quasi-experiment-based elasticity estimates alleviate classic measurement error and endogeneity problems.However, the fact that the two methods produce elasticities of similar order of magnitudeadds confidence to the overall credibility of the estimates.
  
  are they ignoring the heterogeneity and "LATE" issue?
2. daaronr 06 Oct 2023
  
  in Public
  
  states’relative differences in response to aggregate military buildups (which are themselves largelydriven by geopolitical factors) – are unlikely to be correlated with unobservable determinantsof local biodiversity. That is, we assume the United States will not increase national mili-tary spending because states that receive larger military procurement contracts have worsebiodiversity
  
  but couldn't both of these be driven by a third factor ... the state is less environmentally friendly. or maybe I am missing something
3. daaronr 06 Oct 2023
  
  in Public
  
  First, we producecausal estimates of the elasticities between biodiversity outcomes and air pollution. We use aresearch design that isolates variation in local pollution driven by transported pollution fromdistant, upwind cities (e.g., Deryugina et al., 2019; Anderson, 2020). We show that “upwindpollution” coming from areas over 300 km away generates substantial variation in local airquality, and these imported pollution shocks cause reductions in local biodiversity outcomes.Second, we estimate the impact of the military spending shocks on air pollution, and multiplythese estimates by the biodiversity-pollution elasticities we obtain from step one. Together,these exercises give us the expected impact of the military shocks on biodiversity throughair pollution. We find that pollution accounts for 20-60 percent of the reduced form effect ofmilitary shocks, suggesting air pollution is a first-order pathway underlying the production-biodiversity link
  
  have they successfully shown this 'mediation' channel ... with 2 separate sources of exogenous variation? (That's always very challenging to identify)
  
  unjournal
4. daaronr 06 Oct 2023
  
  in Public
  
  Second, there is substantial distributional heterogeneity, where the negativeassociation at the lowest decile of biodiversity is almost twice as large as the average.
  
  But could this reflect something mechanical like a nonlinearity?
Visit annotations in context

Tags

unjournal

Annotators

daaronr

URL

deliverypdf.ssrn.com/delivery.php
Aug 2023
github.com github.com

Incompatible architecture problem when installing new package · Issue #879 · rstudio/renv

3
1. daaronr 07 Aug 2023
  
  in Public
  
  FC = /opt/homebrew/bin/gfortran-11
  
  For fortran
2. daaronr 07 Aug 2023
  
  in Public
  
  CXX = /opt/homebrew/bin/g++-11
  
  Compiler for C++
3. daaronr 07 Aug 2023
  
  in Public
  
  CC = /opt/homebrew/bin/gcc-11
  
  Tells R which C compiler to use
Visit annotations in context

Annotators

daaronr

URL

github.com/rstudio/renv/issues/879
Jul 2023
unjournal.github.io unjournal.github.io

The Unjournal evaluations: data and analysis

1
1. daaronr 13 Jul 2023
  
  in Public
  
  book
  
  It's a work in progress
Visit annotations in context

Annotators

daaronr

URL

unjournal.github.io/unjournaldata/
forum.effectivealtruism.org forum.effectivealtruism.org

HLI's 2023-4 research agenda

52
1. daaronr 11 Jul 2023
  
  in Public
  
  We are now planning a further update in response to additional comments (e.g., from James Snowden and GiveWell). We expect this will include updating our analysis with recently completed studies and refining some technical aspects of the analysis, including:Our systematic review, and the weight we place on different sources of evidenceEstimated spillover benefits for household membersCost estimatesTechnical details, such as:How long do the effects of psychotherapy last?How important is the expertise of the deliverer or number of sessions?Are the effects of psychotherapy affected by publication bias?
  
  This seems extremely high-value and potentially ideal for the Unjournal's non-academic stream. Ryan 'had this in mind too'
  
  #unjournalresearchprioritization
2. daaronr 11 Jul 2023
  
  in Public
  
  but longtermists often claim priorities such as AI alignment and preventing pandemics are important, even if we solely consider present wellbeing, so we shouldn’t dismiss the possibility.
  
  I don't see how this argues against the 'suspicious convergence' claim... OK, I see Jack Malde's comment now, which basically gets at my doubts here.
3. daaronr 11 Jul 2023
  
  in Public
  
  we have a full list of research ideas that we hope to explore
  
  this is the list you linked above under 'organizations'. fwiw it's an interesting list but it's very sparsely populated (most columns have a name only). Some fleshing out and ranking/prioritizing could be helpful here.
4. daaronr 11 Jul 2023
  
  in Public
  
  If such views are true, that would count against longtermism
  
  I don't see this as a promising research agenda. My sense of it is that it is pretty intractable. (I'm not saying if it is true/false/wrong/right, just that I am not sure if there will be a lot of practical value in pursuing it? OK I see some approaches that might be helpful, if one has a tractible way to model welfare considerations with PAV it might win some people over.)
5. daaronr 11 Jul 2023
  
  in Public
  
  credible cause for longtermists.
  
  'cause' or indirect instrumental goal?
6. daaronr 11 Jul 2023
  
  in Public
  
  We’ve published two working papers on moral uncertainty: The property rights approach to moral uncertainty and Wheeling and dealing: An internal bargaining approach to moral uncertainty, which both explore a novel, bargaining-based approach to acting when you’re uncertain what’s morally right. (This is very roughly akin to the ‘moral parliament’ approach.) We’re currently working with two external co-authors on a new paper that combines these ideas, which we plan to publish in an academic journal.
  
  Potential relevant to #unjournalresearchprioritization, depending on the approach
  
  #unjournalresearchprioritization
7. daaronr 11 Jul 2023
  
  in Public
  
  5.1 Using WELLBYs to compare the value of extending lives against improving lives
  
  Somewhat relevant to #unjournalresearchprioritization
  
  #unjournalresearchprioritization
8. daaronr 11 Jul 2023
  
  in Public
  
  Although unlikely, we may also do some work relating to animal welfare; a challenge is that we prefer to rely on self-reports, which animals can’t give.
  
  this could be relevant for Unjournal. How unlikely? If it's so unlikely, why mention it?
9. daaronr 11 Jul 2023
  
  in Public
  
  Assess the social desirability bias and other self-reporting biases in SWB data (for example: Do people give answers surveyors want? Is it a problem? If so, can anything be done?)Explore whether the measure of SWB matters (for example, if the key outcome is happiness rather than life satisfaction, do we get different priorities?)
  
  unjournalresearchprioritization
10. daaronr 11 Jul 2023
  
  in Public
  
  Our working paper A Happy Possibility about Happiness (and other) Scales, a working paper attempts to provide the first overview of both the theory and evidence of the comparability of subjective wellbeing scales (e.g., is your 7/10 the same as my 7/10?). We plan to revise this for publication in an academic journal.
  
  #unjournalresearchprioritization
11. daaronr 11 Jul 2023
  
  in Public
  
  Our article To WELLBY or not to WELLBY? sets out the WELLBY method, its strengths, weaknesses, and areas for future work. To expand on this, we are:
  
  This seems very relevant for Unjournal
  
  #unjournalresearchprioritization
12. daaronr 11 Jul 2023
  
  in Public
  
  3. The nature of wellbeing
  
  Probably not relevant for the Unjournal at this point, but there may be some overlap
13. daaronr 11 Jul 2023
  
  in Public
  
  existing work (where a public document is available)
  
  academics (at least in my field) would distinguish a fourth stage 'having been accepted in a journal after peer review'. Not sure how important that distinction is for you.
  
  Note that The Unjournal is trying to make that last stage less burdensome and more informative by commissioning public evaluation and rating of work (rather than relying on tedious and imprecise the 'which journal was it published in' measure)
14. daaronr 11 Jul 2023
  
  in Public
  
  Some work is both existing and current (where we have extant research we are updating).
  
  that's the best, I like to see all research as 'permanent alpha' mode
15. daaronr 11 Jul 2023
  
  in Public
  
  which means we have a number of ongoing projects.
  
  'which means' --- the implication is not clear here
16. daaronr 11 Jul 2023
  
  in Public
  
  The notable ones are:
  
  for me the more concrete measurement issues are at least as important ... you include these, but I don't see it in this paragraph.
17. daaronr 11 Jul 2023
  
  in Public
  
  We also have a long list of organisations we would like to explore, including the Shamiri Institute, Action for Happiness, and Koko.
  
  The airtable view is linking interventions and cause areas, not organizations
  
  Why and how did you choose and prioritize these? It's a huge space to explore?
18. daaronr 11 Jul 2023
  
  in Public
  
  e expect others will provide different types of mental health interventions, such as social-emotional learning. We expect to examine Friendship Bench, Sangath, and CorStone unless we find something more promising.
  
  Does that mean you will need to assess (and consider research evidence) on these other non-psychotherapy interventions? If so, that deserves its own section perhaps?
19. daaronr 11 Jul 2023
  
  in Public
  
  Based on our cause area report on mental health and our cost-effectiveness analysis of psychotherapy, we think mental health is a promising area in which to find cost-effective interventions to improve wellbeing.
  
  this paragraphs seems unneeded and repetitive. Or am I missing something here?
20. daaronr 11 Jul 2023
  
  in Public
  
  so we will also update our assessment of StrongMinds after we update our psychotherapy evaluation.
  
  Maybe restate this to clarify that you are not reevaluating SM as an organization again, but will update the evaluation of their impact in light of your updated evaluation of the intervention?
21. daaronr 11 Jul 2023
  
  in Public
  
  2.1 Updated evaluation of psychotherapy
  
  this part still seems like an intervention not an organization
22. daaronr 11 Jul 2023
  
  in Public
  
  Are the effects of psychotherapy affected by publication bias?
  
  Pedantic: I'd say the 'estimated effects' here ... obviously the effects themselves are not affected by this bias
23. daaronr 11 Jul 2023
  
  in Public
  
  e’ve found that psychotherapy for depression is several times more cost-effective than cash transfers for improving happiness, deworming has an unclear long-term effect,
  
  in the statement in the 1-pager, you stated
  
  we’ve found that psychotherapy is several times more cost-effective than cash transfers or deworming for improving happiness.
  
  That's not entirely consistent with this sentence
24. daaronr 11 Jul 2023
  
  in Public
  
  From this work, we’ve found that psychotherapy is several times more cost-effective than cash transfers or deworming for improving happiness. We concluded that comparing psychotherapy to antimalarial bednets, a life-saving intervention, depends heavily on various philosophical assumptions: treating depression ranges from about as good as to several times better than antimalarial bednets, depending on the assumptions.
  
  these sentences repeat the sentences above in the one-pager
25. daaronr 11 Jul 2023
  
  in Public
  
  evaluated the cost-effectiveness of organisations that provide psychotherap
  
  The psychotherapy report doesn't seem to be about a particular organization. I'm a bit confused about the structure here. How does this differ from a 'cause area exploration' at this point?
26. daaronr 11 Jul 2023
  
  in Public
  
  e, including psychedelics, opioids, poverty, loneliness, sleep, and air pollution.
  
  These again combine problems with potential remedies and interventions. And 'psychedelics' -- is that aimed at curing 'problems' or boosting the upper end joys of life?
27. daaronr 11 Jul 2023
  
  in Public
  
  Longlist of future cause areas to explore
  
  this needs a header (it now is under 1.4). Not sure this long list is helpful here though? What's actionable about this? Is it linked to an appeal for more funding?
28. daaronr 11 Jul 2023
  
  in Public
  
  1.2 Child development effects (e.g., abuse, trauma, nutrition)
  
  I suspect some strong Unjournal/academic research links here. Also to the house improvement ones and possibly the fistula ones too.
  
  #unjournalresearchprioritization
29. daaronr 11 Jul 2023
  
  in Public
  
  that may have large impacts on wellbeing as well. So far we have completed shallow reviews on pain, lead exposure, and immigration.
  
  For unjournal research prioritization, I guess I will have to dig into those reviews to identify the most pivotal research to have evaluated?
  
  These link articles but don't contain a 'list of works cited' at bottom. Could you provide that ... even better an 'annotated/categorized/prioritized list' explaining which ones you rely on most heavily, and which you have the most uncertainty over?
  
  #unjournalresearchprioritization
30. daaronr 11 Jul 2023
  
  in Public
  
  Research agenda
  
  In the previous agenda you tried
  
  to articulate, within each research area, where additional research seems more (or less) useful, and therefore what our research agenda is for the next one to two years.
  
  This seemed particularly relevant to helping the Unjournal help you. Not sure this new agenda does this as much.
  
  #unjournalresearchprioritization
31. daaronr 11 Jul 2023
  
  in Public
  
  A working paper exploring a bargaining-based approach to moral uncertainty
  
  How are you defining and considering a 'working paper' here?
  
  "Units of value" .. maybe add a few more words to clarify this?
  
  I assume this will be a theoretical paper (i.e., no surveys or data?)
  
  #unjournalresearchprioritization
32. daaronr 11 Jul 2023
  
  in Public
  
  A revised paper on the theory and current evidence on scale cardinality (e.g., is your 7/10 the same as my 7/10?)
  
  I see a lot of benefit in engaging with academics on this paper, and getting and responding to feedback, possibly within The Unjournal's framework
  
  #unjournalresearchprioritization
33. daaronr 11 Jul 2023
  
  in Public
  
  An academic paper setting out our method for measuring impact using wellbeing
  
  Not sure what is meant by 'an academic paper'. Typically it would be hard to publish a paper in an academic journal that simply 'describes' (or even justifies) the approach that a particular organization takes.
  
  You might have to frame it more as answering or providing evidence on a question of general interest, and/or formally arguing for the 'most appropriate approach' under certain defensible criteria.
34. daaronr 11 Jul 2023
  
  in Public
  
  We will conduct new research on how to measure and interpret subjective wellbeing measures:
  
  A 2021 priority was "Examining how best to convert between different SWB, as well as other, measures (1.2.1)" This seems to have strong academic links relevant to the Unjournal. Is it still a priority?
  
  #unjournalresearchprioritization
35. daaronr 11 Jul 2023
  
  in Public
  
  2. Organisation evaluations
  
  I expect Unjournal-evaluated research to provide inputs relevant to these evaluations, but not to directly evaluate particular organisations. However, we might be able to cover some of this within our 'less academic stream'.
  
  #unjournalresearchprioritization
36. daaronr 11 Jul 2023
  
  in Public
  
  Non-mood-related mental health issues (e.g., psychotic and trauma-related disorders)Child development effects (e.g., abuse, trauma, nutrition)Fistula repair surgeryBasic housing improvements (e.g., concrete floors)
  
  I suspect there are a range of academic papers (in development economics, health economics, psychology, policy, and the social-sciency side of biomedicine) that will inform this, that Unjournal might evaluate.
  
  This can include work that constitutes - impact evaluation of specific interventions, including RCTs non-experimental causal inference - work exploring the impact of specific paths to impact through these interventions (e.g., the career costs of childhood trauma) - work exploring costs and impacts on the market (e.g., impact of housing improvements on the local economy, price elasticities, etc.)
  
  #unjournalresearchprioritization
37. daaronr 11 Jul 2023
  
  in Public
  
  1. Cause area explorations
  
  In the 2021 agenda you stated "Our main current focus, and where the majority of our eΛort will go, is Area 2.3: using subjective well-being scores to compare the cost-effectiveness of highly-regarded health and development interventions used in low-income countries."
  
  Is this still your priority? Is this in line with the 'Cause area explorations" category here?
  
  #unjournalresearchprioritization
38. daaronr 11 Jul 2023
  
  in Public
  
  Conduct further theoretical work:
  
  The boundary between theoretical and applied is not always clear here. Some research, maybe the methodological and measurement research in particular, has both theoretical aspects and very applicable and even empirical aspects. Calling this 'theoretical' might confuse people who would conflate theoretical with philosophical. E.g., research into which survey and other reporting instruments are more reliable, better reflect the actual measures of interest ... this seems very applied to me, and probably relevant to The Unjournal's scope as well.
  
  #unjournalresearchprioritization
39. daaronr 11 Jul 2023
  
  in Public
  
  asurement of wellbeing. This has included evaluating philosophical views of wellbeing and life satisfaction, pioneering methods to conduct cost-effectiveness analyses using wellbeing, and conducting novel research on wellbeing measurement. We
  
  I think you may have moved a lot of the content outlined in the previous Research Agenda into those linked reports?
  
  #unjournalresearchprioritization
40. daaronr 11 Jul 2023
  
  in Public
  
  Where relevant, we hone in and compare the top organisations implementing those interventions.
  
  Is this necessary in the wheelhouse of HLI? If your focus is on assessing impact from the wellbeing perspective, does that interact with things like 'organizational capabilities' at all?
  
  Maybe better to outsource the latter?
41. daaronr 11 Jul 2023
  
  in Public
  
  picking new cause areas to investigate, then narrowing down to the specific organisations – which will enable us to look broadly and deeply at the same time.
  
  When will you do each? Do you anticipate returning to broad cause areas you've previously decided not to pursue?
  
  How much will you defer to other orgs and researchers in the broader prioritization?
42. daaronr 11 Jul 2023
  
  in Public
  
  large, solvable, and unduly neglected.
  
  why not just namecheck the ITN framework here?
43. daaronr 11 Jul 2023
  
  in Public
  
  broad analyses of different causes.
  
  how do you divide up the 'cause space' and define each 'cause'? Give some examples here? E.g., is "animal welfare' a cause area ... or 'farmed animal welfare' or 'chicken welfare' or 'promoting regulation of chicken farms' (the latter is more of an intervention IMHO)
44. daaronr 11 Jul 2023
  
  in Public
  
  An ultimate goa
  
  one of several ultimate goals? To be pedantic, the 'improve global wellbeing' would seem to be the ultimate goal .... the 'identify the opportunities' is instrumental to that
45. daaronr 11 Jul 2023
  
  in Public
  
  We will explore whether we should improve the wellbeing of people alive now or in future generations:
  
  Why not both? Maybe rephrase this?
  
  Also, will you consider the empirical tradeoffs here, or deeper philosophical issues, or?
46. daaronr 11 Jul 2023
  
  in Public
  
  An academic journal book review of Will MacAskill’s What We Owe The Future
  
  what academic journal are you thinking?
47. daaronr 11 Jul 2023
  
  in Public
  
  An experimental survey to test assumptions about subjective wellbeing measures, including comparability, linearity, and the neutral point
  
  I talked to someone recently who had done some survey work in this area -- maybe remind me to get back to you on it.
48. daaronr 11 Jul 2023
  
  in Public
  
  An academic paper on life satisfaction theories of wellbeing
  
  This seems underexplained ... what and why? (add a bit or link)
49. daaronr 11 Jul 2023
  
  in Public
  
  the cost-effectiveness of several organisations (partially informed by our cause exploration work)
  
  mention how you will do it better/different or add value to what other orgs are doing?
50. daaronr 11 Jul 2023
  
  in Public
  
  Applied research to maximise global wellbeing
  
  Doing this yourselves? Synthesizing work? Sponsoring work?
51. daaronr 11 Jul 2023
  
  in Public
  
  Non-mental health organisations (organisations TBD)
  
  Non mental-health orgs within the global health or global health and development space, or much more widely ranging, across vastly different causes etc?
52. daaronr 11 Jul 2023
  
  in Public
  
  To find new promising solutions to the biggest problems,
  
  this phrase seems a bit vague? Also, are these 'cause areas' or possible interventions, or a mix?
Visit annotations in context

Tags

#unjournalresearchprioritization

Annotators

daaronr

URL

forum.effectivealtruism.org/posts/LAxHnFpumCANAk32r/hli-s-2023-4-research-agenda

David Reinstein

Dr. David Reinstein, Senior Economist, Rethink Priorities https://daaronr.github.io/markdown-cv/

Project pages: innovationsinfundraising.org, giveifyouwin.org

Twitter: @givingtools

Annotations: 954

Joined: July 17, 2019

Location: Western Massachussets

Link: daaronr.github.io/markdown-cv/

ORCID: 0000-0002-0470-4991

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

General Guidelines for Interpreting Krippendorff's Alpha:

Comparison to Your Data:

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

unjournalresearchprioritization

Tags

Annotators

URL