- Last 7 days
-
arxiv.org arxiv.org
-
The authors present a seven-part taxonomy aimed at identifying how AI can be integrated into workflows in data repositories. The taxonomy focuses on functions such as ‘acquire’, ‘validate’, ‘enhance’, and ‘organize’. The authors suggest that the piece is meant for the community of librarians and Open Science practitioners working with data repositories. The piece was reviewed by two metaresearchers. Both emphasized the potential value of the proposed taxonomy, but also noted a number of areas in which the paper needs substantial improvement. The reviewers admitted to be struggling with what they saw as a lack of clarity and practical applicability of the taxonomy. They emphasized the need for more detailed explanations of the various functions, such as case studies or examples of how AI is currently used in repositories. A second main point mentioned by both reviewers was the lack of a section explaining how exactly the taxonomy was developed. This makes it hard to see how and why the authors arrived at the seven functions and what knowledge/insights they are based on – is it previous literature or based on practical work experience of the authors? This also ties in with a third point raised by the reviewers, namely the lack of references. Finally, especially reviewer 2 felt that the concluding section on balancing human and AI expertise is interesting, but also noted that it currently feels a bit disconnected from the main part of the text.
The reviewers and I agree that the paper can potentially become a valuable contribution to the metaresearch literature related to data repositories, provided that the authors address the various criticisms raised by the reviewers.
Conflict of interest of the handling editor: none
-
General review
The article describes a taxonomy which can be used to help facilitate the use of AI in data repositories. The taxonomy is similar to other descriptions of the data or research life cycle and is not particularly AI specific although brief descriptions of how AI could be used in each stage are included. The article is not currently written as a journal article, and does not engage with the current literature or research within the field, which would help to explain the importance or value of this specific taxonomy.
A literature review should be included and references should also be included throughout using an inline referencing style rather than hyperlinks. A references section should be included at the end of the article.
It is not clear how the 7 areas for AI in data repositories were decided. Is this opinion, based on currently literature, based on user experience…? Without knowing this it is hard for the reader to understand how justified the categories are. Towards the end of the paper 5 categories of AI involvement are discussed, but these don’t appear to be a part of the framework although they seem to be the most AI specific part. I’d suggest that the taxonomy could benefit from including more AI specific terms.
The article could be improved by more detail about how this framework could be used in practice (or is currently being used). It is unclear whether this is just a suggestion of shared language or something more formal like CReDIT or MeSH?
The Balancing AI and Human Expertise in Data Repositories section gives a nice overview of some of the concerns, however this section needs to be referenced as there have been lots of discussions in the literature about the important of ‘humans in the loop’.
The ‘three suggestions to promote trust and transparency’ at the end are interesting but come rather out of nowhere. These should be tied into the body of the paper. How are they related to the taxonomy?
Specific suggestions
Engagement with the literature would help to show why this taxonomy is important or relevant. Some areas which the literature review could cover are:
-
How is AI being used with repositories at the moment? (or not)
-
What is the importance of standard taxonomies?
-
Are there other examples of AI taxonomies?
-
Are there other ways of describing AI roles which are currently being used and why are they not sufficient?
Each of the taxonomy descriptions include examples of how AI could benefit this role, but they would be greatly enhance by giving more detail of projects where this is already happening or explaining where (and why) it isn’t happening. A specific example of this is within the Organize section where a useful project to discuss might be the Library of Congress metadata labelling project.
A more comprehensive introduction to the project and what you are trying to do would be helpful. Readers may not be specialists in either AI or digital repositories.
The introduction should also explain the current situation and problem that this taxonomy is trying to solve. Future possible benefits are described, but it does not discuss what else is happening in this space or the current situation.
“Just as AI can revolutionize other forms of scholarly communications like peer-reviewed publications” – the reference justifying this is an editorial, a research paper would be more authoritative. There have been a lot of articles arguing the positives and negatives of AI within the field of scholarly communication and this should be discussed more thoroughly.
“it can bring significant improvements to data repositories” What are these improvements? Has anyone done this yet or is it only theoretical?
“As AI becomes more integrated into data repository workflows” Is it becoming more integrated into these workflows? Or do you mean “if”?
-
-
This commentary (or perhaps it is more similar to a blogpost) proposes a taxonomy for tasks for how AIs could be used within workflows in data repositories. Although it is not the stated aim, it also reflects on the relationship between AIs and humans in these workflows and how to develop trust and transparency in a repository while using AIs.
The idea of having a taxonomy which can be used to spur discussions or to classify the roles of AI in repositories is useful, and the taxonomy roles themselves are sensible, although I am not sure how feasible they actually are at the moment. This is perhaps related to a lack of clarity about if these capabilities already exist for AIs to be used in these ways, or if this is the imagined future. It is also not clear which AIs the authors are referring to; is this targeted to LLMs specifically? To other AIs? I imagine this is important in identifying the potential tasks which AI could perform.
Although the authors provide examples of possible tasks for each part of the taxonomy, I still find it a bit abstract and vague. Maybe it would help to include a case study or example of a repository to carry through the steps and demonstrate how this looks/could look in practice? Some of the statements introducing each section of the taxonomy could be a bit more clear also, especially in terms of the target audience. Overall it is unclear to me who the exact target audience is – is this aimed for all repositories or just generalist repositories, given that it was developed within GREI? The latter part of the article refers to generalist repositories, but the earlier part (the taxonomy) does not. This raises the question of whether these steps would apply to all types and sizes of data repositories, or just to larger “generalist” ones. Some of the names for the taxonomy tasks may also be confusing, e.g. “share” – which usually tends to refer to the act of researchers sharing/depositing data in repositories rather than repositories making data available and facilitating reuse,.
One of the biggest points for improvement is that it is not very clear how the taxonomy was developed. The authors mention that it is based on other taxonomies (but they do not provide links or references) and their “coopetition efforts” within the GREI consortium. What did these activities entail exactly? There is a link in the acknowledgements section to authors of a very similar taxonomy (actually a very similar piece overall) that was developed for publishing workflows. Why is this not referenced in the article earlier, or the relationship between the two pieces described? Overall, the referencing should be improved and standardized (although again how this is done depends a bit on how the authors envision the future of this work) – there is currently no reference list at all.
It also seems as if the taxonomy part of the article is disconnected from the latter sections (aside from the conclusion). I actually find the section on balancing human and AI expertise to be quite valuable as its own contribution, as it makes it clear that using AIs is not an all-or-nothing proposition in repositories’ workflows. I think it would help if the authors could let readers know (at least) that this section, and the one on trust, are a part of the article in the introduction. Or perhaps there is room for restructuring the argument here, and somehow foregrounding the human/AI section.
Other reflections
It is a bit difficult to review this piece, not knowing whether it is intended to be a commentary, blogpost, or other type of article.
-
Authors:
- Mark Hanhel m.hahnel@digital-science.com
- Stefano Iacus siacus@g.harvard.edu
- Ryan Scherle ryan@datadryad.org
- Eric Olson eric@cos.io
- Nici Pfeiffer nici@cos.io
- Kristi Holmes kristi.holmes@northwestern.edu
- Mohammad Hosseini mohammad.hosseini@northwestern.edu
-
1
-
-
10.48550/arXiv.2411.08054
-
GREI Data Repository AI Taxonomy
-
- Mar 2025
-
osf.io osf.io
-
This protocol aims to address two questions: (1) What do we know about the science underlying impactful legal decisions? (2) How can we assess this evidence efficiently and accurately, such that it is usable for courts? The protocol has been reviewed by three reviewers (reviewer 2 in fact represents a team of three individuals). The reviewers mention various strengths of the protocol. Reviewer 1 emphasises the importance and timeliness of the research questions and praises the interdisciplinary nature of the research team. Reviewer 3 considers the protocol to be thoughtful and detailed, and reviewer 2 notes that the protocol presages an important effort. The reviewers do not see any major shortcomings in the protocol, but they do highlight opportunities to strengthen the protocol, such as considering studies published in languages other than English and adding more detail on how team disagreements will be resolved.
Competing interest: Jennifer Byrne is a member of the editorial team of MetaROR working with Jason Chin, a co-author of the protocol and also a member of the editorial team of MetaROR.
-
Summary
This protocol describes the plan for a systematic review of the literature on stab wounds. The focus is on the types of observations made in such cases, and whether there are any (types of) observations that can be considered “indicators” of the manner of death, to help distinguish between cases of self-inflicted injury and those inflicted by others.
Strong points of this research plan
The authors present compelling arguments for the need for the proposed meta-research project; they refer to a recent case in the High Court of Australia (Lang v the Queen). The arguments highlight the importance and timeliness of the research questions. More generally, the field of forensic pathology and its perception and use by the legal community seems to be an area with great research potential: see for example the problematic cases involving the testimony of Colin Manock in South Australia (e.g. the Keogh case, where the examination of bruises was an issue).
Overall, the research plan is well informed: the authors have conducted a preliminary review of existing relevant studies and reviews. They use the findings from this preliminary review to critically inform the design of their study.
The research team is interdisciplinary, with members from law, psychology and pathology, and appears to be suitably qualified to carry out the proposed research.
The research plan is sufficiently detailed and transparent in terms of search procedures, eligibility criteria, outcome variables, data management and open access policy, which should make the research results widely accessible and reproducible.
Comments, suggestions, critiques
The title includes the term “reliability”, but it is never defined in the text. While this term can be taken in its common sense interpretation, this may not be sufficient for a scientific study. Do the authors mean “reliability” as used, for example, by the US FRE? Or do they understand the term to be similar to the PCAST’s use of the term “validity”?
The plan is not clear (enough) about how – conceptually – to characterise the potential of an observation (made by a pathologist) to provide information about a selected question of interest (e.g., manner of death, the way in which an injury was inflicted, etc.). Formally, the diagnosticity of an observation (or type of observation) is defined in terms of a likelihood ratio. In other words, for an observation to have diagnostic value with respect to a given proposition (hypothesis), the probability of the observation of interest given the proposition of interest must be higher than given an alternative proposition. Thus, whatever this study will reveal about medico-legal observations (in stab wound cases), an inferential framework is needed to assess diagnosticity and, more broadly, reliability. The research plan is silent on this aspect. Instead, most of the effort is spent on descriptive statistics. There is nothing wrong with descriptive statistics, but it will not help to address the main question posed in the title of the proposed research. As an aside, the reference to “confidence intervals” (p. 15 and 19) is unfortunate in the sense that frequentist statistics, although (still) ubiquitous, are problematic for a variety of reasons.
To some extent, the research proposal is too uncritical and passive with respect to terminology that appears to be standard in the field in which the literature review is to be conducted. Consider, for example, the terms “defense injuries” and “tentative injuries” (p. 7). These terms are problematic because they mix observations (e.g., cuts) with ground truth (i.e., self-inflicted or third-party inflicted). Since the ground truth cannot be known in actual cases, “defense injury” cannot meaningfully serve as a descriptor. Moreover, the use of such terms is problematic: suppose an examiner talks about “tentative injuries”. This could suggest to the recipient of expert information that the observed injury is necessarily self-inflicted. Of course, the authors’ intention might be to determine how diagnostic the expert’s utterance of “tentative injury” is with respect to the proposition of self-inflicted injury (without assuming that the utterance of “tentative injury” necessarily implies self-inflicted injury). Nevertheless, this doesn’t solve the problem of confusing terminology. Therefore, this research project could be strengthened by not limiting itself to the descriptive_adoption of standard terminology, but by including a critical analysis and discussion of terminology. In fact, the problem of testimony in this field is not limited to the (currently unknown) diagnosticity of observations made during pathological examinations. It also depends on the coherence of foundational terminology (i.e., its logic) used in this field, as well as on the soundness of the reasoning methods used (e.g., the crucial distinction between findings/observations and _unobservable ground truth states).
On p. 15, the research plan states: “We will attempt to quantitively synthesise cases by first separating them into four groups: those classified by study authors as suicides, homicides, accidents or inconclusives. Then, we will list the frequency with which the case variables listed above appear in each group.” Treating the data in this way will lead to useful statistics: i.e., the probability of different observations given different case types (suicides, homicides, etc.). Such statistics characterise the diagnosticity of the various observations (“case variables”). However, a major problem arises here: how – if at all – it can one known that the reported classification of cases into suicides, homicides etc. was correct? For obvious reasons, none of the case reports in the literature involve experiments under controlled conditions. However, there may be other information or evidence in a case (e.g., video surveillance) that supports particular classifications. Will the project control for this complication, and if so, how?
It would be valuable for this research to include normative considerations, as opposed to a purely descriptive perspective, of what it means for an observation – be it in pathology or any other forensic field – to be “indicative” or discriminative with respect to selected (disputed) propositions. This relates to the notion of inferential framework_mentioned above, which is largely established in the philosophy of science (see e.g. Howson/Urbach, _Scientific Reasoning, 2005), and which could serve as an additional reference point against which to evaluate the current literature. It remains unclear to the reader why this research project refrains from taking a firmer position on the logic of evaluative thinking, which has now become inseparable from sound evaluation procedures in forensic science. Reviewing and synthesising existing literature is one thing, challenging the current state of the art is another. Combining the two is a valuable opportunity that this project could seize.
Conflict of Interest Statement
I declare that this review has been written in the absence of any competing interest, including any role, relationship (including commercial or financial) or commitment that poses an actual or perceived threat to the integrity or independence of my review and that could be construed as a potential conflict of interest.
-
Thank you for the opportunity to review this protocol. My expertise is in systematic review methods, generally relating to health interventions, and as such I should note that I do not have expertise in forensic pathology or medico-legal issues.
This paper outlines the protocol for a systematic review of characteristics which allow forensic experts to distinguish between suicide and homicide relating to sharp force wounds, in the context of contributing to criminal prosecution. Interestingly, the protocol outlines the development of preliminary approaches to novel methodology adapted for use in this field, including novel approaches to assessing risk of bias and certainty in the evidence, which have primarily been developed to assess intervention research.
I commend the authors for a thoughtful and detailed protocol. In my view, this is a strong piece of work and will contribute findings of interest to the field, as well as contributing to the exploration of methods for the assessment of a category of research for which such methods are currently lacking. I have made a few suggestions below for consideration by the authors that may strengthen the protocol.
Rationale
-
It may be helpful to international readers to clarify in the text of the Rationale that R v Lang is a case in the High Court of Australia, and to spell HCA out in full in the footnote. With regard to readers looking for details on this case, are these published on a website for which a URL can be provided?
-
It would be helpful for readers without a background in legal proceedings to discuss the extent to which research evidence and systematic reviews are or are not commonly presented in legal proceedings, in contrast to expert opinion.
-
Where you discuss the debate about the role of cause of death findings, it would be helpful to explicitly state in which jurisdictions these discussions have been occurred, so that readers can understand whether and how this topic relates to their own jurisdiction or where there may be differences. It may further be helpful to elaborate briefly on why cause of death determinations may be considered unreliable.
Methods
-
It is a limitation to the review to only include studies published in English. The proficiency of automated translation is currently such that screening of potentially relevant studies in multiple languages is often possible, with assistance from multilingual colleagues or communities such as Cochrane Engage can enable the inclusion of studies in additional languages.
-
Regarding grey literature, both of the listed organisations appear to be based in the USA (although this is not stated for the OSCAC) – could you provide a rationale for only using US institutions to identify relevant data? For example, there may be organisations in Australia (which is the jurisdiction of interest for the legal aspects of this review) or in countries with comparable criminal legal systems (such as the UK, Europe or elsewhere).
-
Will a software tool be used to support study selection, such as Covidence or similar? This may contribute to your analysis of time and process.
-
Injury severity score – will injury severity be captured if other measures of severity are used, or not at all? There are methods available to consider results across different measures of similar outcomes, if these would be considered valid alternatives.
-
· In the rationale and the methods relating to risk of bias, you note that it may be relevant to capture (if available) information such as whether witness, video evidence or a confession was available to support the conclusion of cause of death. Should this kind of characteristic be added to the data collected?
-
The methods provided for data synthesis, risk of bias assessment and the certainty/quality of the evidence (based on GRADE) all currently read as if all your included studies will be case series or case studies. As your included studies also include observational studies that may give effect estimates such as odds ratios rather than individual counts of characteristics, methods should be provided for handling and perhaps quantitatively synthesising this kind of data, where appropriate. Risk of bias methods and GRADE methods may more closely correspond to the existing methods for this kind of study, and require less adaptation.
-
GRADE methodology generally refers to “certainty in the evidence” rather than confidence, to avoid confusion with risk of bias assessment.
-
You note in the rationale that you plan to collect data on the review process, such as time taken to complete different tasks. I’d suggest putting this detail in the methods section.
-
I would recommend giving some further thought to how you will draw conclusions from the data you find int his review. Assuming that sufficient data can be found, and that you have a set of either percentages from case studies/series or effect estimates from observational studies, it is likely that you will wish to discuss which factors appear to be associated with different causes of death, or which are most effective at discriminating between causes. I would strongly recommend considering what thresholds for associations or differences between causes of death would underpin such conclusions, and specify these in advance. I’d recommend speaking to a statistician to draft these methods appropriately and avoid errors in interpreting the estimates found.
-
-
On behalf of the Center for Integrity in Forensic Sciences and its Executive Director, Katherine H. Judson, as well as its co-founder, Professor Emeritus Keith A. Findley, I am pleased to submit these comments on the above-cited draft work of Jason Chin, Stephanie Clayton, and their colleagues. Thank you for soliciting our views. You may learn more about the Center for Integrity in Forensic Sciences at www.cifsjustice.org
The authors’ explanation of their planned systematic review is helpful and presages an important effort. We commend the authors for their thoughtful study design, their transparency, and their initial research into source materials listed in Appendix A.
Two minor methodological concerns appear to us initially. One, we do not fully understand the intention, described in four places (pages 11, 18, and 19), to use two independent reviewers of data and to resolve disagreements “by discussion.” It is not clear whether that discussion is to occur between the pair of reviewers only, or whether others will join the adjudicative discussion. In either event, it may be useful to consider an odd number of adjudicators for purposes of breaking a deadlock, if necessary. Two, the intended systematic review excludes studies not published in English (see page 10). While the lack of proficiency in other languages among the research team is understandable (and rightly acknowledged), the availability of reliable translations today should allow inclusion of studies published in other languages, we suspect.
Our two principal substantive concerns are broader, though. First, this systematic review appears to overlook risks of availability bias and confirmation bias in information gathering by pathologists, who often rely on information passed along by law enforcement officers and others invested in a particular outcome or conclusion. Relatedly, forensic pathologists themselves often are closely aligned professionally and attitudinally with law enforcement personnel. Indeed, the pathologists may be employed by prosecutive and investigative agencies of the government, and therefore professionally and financially dependent on their sources of information. We predict that the research team will encounter frequently—perhaps almost uniformly—the absence of pre-existing protocols that Cochrane raises as a concern and that the authors rightly note at page 18 of this draft. That common absence of a known protocol, established in advance and subject to compliance assessment later, may be both caused in part by and an effect of the availability and confirmation (or tunnel vision) biases we discuss here.
Second, the systematic review does not seem designed to consider the normative question of which systemic actor or actors are best equipped and most appropriate to make manner of death determinations for judicial, as opposed to statistical, purposes. We hope that the researchers will recommend that such determinations by pathologists or other biomedical experts should be limited to statistical purposes, for use in allocating public
resources. In the end, regardless how reliable their opinions, pathologists and biomedical practitioners are no better positioned than jurors or judges to make adjudicative determinations of suicide or homicide, as the factfinders in a judicial system should have access to all information—presented to them in a more transparent, testable form in court—that the pathologist has in drawing conclusions. And as a normative matter, those adjudicative conclusions are assigned to jurors and judges, not to pathologists or other biomedical experts.
With these caveats, we again welcome this initial work and description of the metaanalysis to come. Especially if confined to assessing and advancing the reliability of manner of death determinations in cases of sharp force wounds for statistical purposes, and thus as an aid in allocating public resources outside the judicial system, the eventual systematic review may be quite valuable.
Finally, for a pertinent and longer discussion of related issues, see Keith A. Findley & Dean A. Strang, Ending Manner of Death Testimony and Other Opinion Determinations of Crime, 60 Duquesne Law Review 302 (2022). The authors themselves cite this article at footnotes 5 and 7 of their draft. Again, thank you for the opportunity to offer these comments.
-
Authors:
- Jason Chin jason.chin@anu.edu.au
- Stephanie Clayton stephanie.Clayton1@health.nsw.gov.au
- Stephen Cordner stephen.cordner@vifm.org
- Gary Edmond g.edmond@unsw.edu.au
- Bethany Growns Bethany.growns@canterbury.ac.nz
- Kylie Hunter kylie.hunter@sydney.edu.au
- Bernard I'Ons bernard.ions@health.nsw.gov.au
- Kristy Martire k.martire@unsw.edu.au
- Gianni Ribeiro gianni.riberio@unisq.edu.au
- Stephanie Summersby stephanie.summersby@police.vic.gov.au
-
2
-
-
10.31222/osf.io/atu56
-
Systematic review: The reliability of indicators that may differentiate between suicidal, homicidal, and accidental sharp force wounds
-
This protocol aims to address two questions: (1) What do we know about the science underlying impactful legal decisions? (2) How can we assess this evidence efficiently and accurately, such that it is usable for courts? The protocol has been reviewed by three reviewers (reviewer 2 in fact represents a team of three individuals). The reviewers mention various strengths of the protocol. Reviewer 1 emphasises the importance and timeliness of the research questions and praises the interdisciplinary nature of the research team. Reviewer 3 considers the protocol to be thoughtful and detailed, and reviewer 2 notes that the protocol presages an important effort. The reviewers do not see any major shortcomings in the protocol, but they do highlight opportunities to strengthen the protocol, such as considering studies published in languages other than English and adding more detail on how team disagreements will be resolved.
Competing interest: Jennifer Byrne is a member of the editorial team of MetaROR working with Jason Chin, a co-author of the protocol and also a member of the editorial team of MetaROR.
-
-
upstream.force11.org upstream.force11.org
-
In this blog post the author argues that problematic incentive structures have led to a rapid increase in the publication of low-quality research articles and that stakeholders need to work together to reform incentive structures. The blog post has been reviewed by three reviewers. Reviewer 3 considers the blog post to be a ‘great piece’, and Reviewer 1 finds it ‘compellingly written and thought provoking’. According to Reviewer 2, the blog post does not offer significant new insights for readers already familiar with the topic. All three reviewers provide recommendations for clarifications. Reviewers 2 and 3 also suggest the blog post could be more critical toward publishers. Reviewers 1 and 3 suggest taking a broader perspective on incentives, for instance by also considering incentives related to teaching and admin or incentives for funders, libraries, and other organizations.
-
This op-ed addresses the issue with the exponential increase in publications and how this is leading to a lower quality of peer review which, in turn, is resulting in more bad science being published. It is a well-written article that tackles a seemingly eternal topic. This piece focussed more on the positives and potential actions which is nice to see as this is a topic that can become stuck in the problems. There are places throughout that would benefit from more clarity and at times there appears to be a bias towards publishers, almost placing blame on researchers. Very simple word changes or headings could immediately resolve any doubt here as I don't believe this is the intention of the article at all.
Additionally, this article is very focussed on peer review (a positive) but I think that it would benefit from small additions throughout that zoom out from this and place the discussion in the context of the wider issues - for example you cannot change peer review incentives without changing the entire incentives around "service" activities including teaching, admin etc. This occurs to a degree with the discussion on other outputs, including preprints and data. Moreover, when discussing service type activities, there is data that reveals certain demographics deliberately avoid this work. Adding this element into the article would provide a much stronger argument for change (and do some good in the new current political climate).
Overall, I thought this was a great piece when it was first posted online and does exactly what a good op-ed should - provoke thought and discussion. Below are some specific comments, in reading order. I do not believe that there are any substantial or essential changes required, particularly given that this is an op-ed article.
-----
Quote: "Academia is undergoing a rapid transformation characterized by exponential growth of scholarly outputs."
Comment: There's an excellent paper providing evidence to this: https://direct.mit.edu/qss/article/5/4/823/124269/The-strain-on-scientific-publishing which would be a very positive addition
Quote: "it’s challenging to keep up with the volume at which research publications are produced"
Comment: Might be nice to add that this was a complaint dating back since almost the beginning of sharing research via print media, just to reinforce that this is a very old point.
Quote: "submissions of poor-quality manuscripts"
Comment: The use of "poor quality" here is unnecessary. Just because a submission is not accepted, it has no reflection on "quality". As such this does seem to needlessly diminish work rejected by one journal
Quote: "Maybe there are too many poor quality journals too - responding to an underlying demand to publish low quality papers."
Comment: This misses the flip side - poor quality journals encourage and actively drive low quality & outright fraudulent submissions due to the publisher dominance in the assessment of research and academics.
Quote: "even after accounting for quality,"
Comment: Quality is mentioned here but has yet to be clearly defined. What is "quality"? - how many articles a journal publishes? The "prestige" of a journal? How many people are citing the articles?
Quote: "Researchers can – and do – respond to the availability by slicing up their work (and their data) into minimally publishable units"
Comment: I fully agree that some researchers do exactly this. However, again, this seems to be blaming researchers for creating this firehose problem. I think this point could be reworded to not place so much blame or be substantiated with evidence that this is a widespread practice - my experience has been very mixed in that I've worked for people who do this almost to the extreme (and have very high self-citations) and also worked for people who focus on the science and making it as high quality and robust as possible. I agree many respond to the explosion of journals and varied quality in a negative manner but the journals, not researchers are the drivers here.
Quote: "least important aspect of the expected contributions of scholars."
Comment: I think it may be worth highlighting here that sometimes specific demographics (white males) actively avoid these kinds of service activities - there's a good study on this providing data in support of this. It adds an extra dimension into the argument for appropriate incentives and the importance & challenges of addressing this.
Quote: "high quality peer review"
Comment: Just another comment on the use of "quality'. This is not defined and I think when discussing these topics it is vital to be clear what one means by "high quality". For example, a high quality peer review that is designed as quality control would be detecting gross defects and fraud, preventing such work from being published (peer review does not reliably achieve this). In contrast, a high quality peer review designed to help authors improve their work and avoid hyperbole would be very detailed and collegial, not requesting large numbers of additional experiments.
Quote: "conferring public trust in the oversight of science"
Comment: I'm not convinced of this. Conveying peer review as a stamp of approval or QC leads to reduced trust when regular examples emerge with peer review failures - just look at Hydroxychloroquine and how peer review was used to justify that during COVID or the MMR/autism issues that are still on-going even after the work was retracted. I think this should be much more carefully worded, removed or expanded on to provide this perspective - this occurs slightly in the following sentence but it is very important to be clear on this point.
Quote: "Researchers hold an incredible amount of market power in scholarly publishing"
Comment: I like the next few paragraphs but, again, this seems to be blaming researchers when they in fact hold no/little power. I agree that researchers *could* use market pressure but this is entirely unrealistic when their careers depend on publishing X papers in X journal. An argument as to why science feels increasingly non-collaborative perhaps. Funders can have immediate and significant changes. Institutions adopting reward structures, such as teaching for example, would have significant impacts on researcher behaviour. Researchers are adapting to the demands the publication system creates - more journals, greater quantity and reduced quality whilst maintaining control over the assessment - eLife being removed from Wos/Scopus is a prime example of publishers (via their parent companies) preventing innovation or even rather basic improvements.
Quote: "With preprint review, authors participate in a system that views peer review not as a gatekeeping hurdle to overcome to reach publication but as a participatory exercise to improve scholarship."
Comment: This is framing that I really like; improving scholarship, not quality control.
Quote: "buy"
Comment: typo
Quote: "adoption of preprint review can shift the inaccurate belief that all preprints lack review"
Comment: Is this the right direction for preprints though? If we force all preprints to be reviewed and only value reviewed-preprints, then we effectively dismantle the benefits of preprints and their potential that we've been working so hard to build. A recent op-ed by Alice Fleerackers et al provided an excellent argument to this effect. More a question than a suggestion for anything to change.
Quote: "between all of those stakeholders to work together without polarization"
Comment: I disagree here - publishers have repeatedly shown that their only real interest is money. Working with them risks undermining all of the effort (financial, careers, reputation, time) that advocates for change put in. The OA movement should also highlight perfectly why this is such a bad route to go down (again). Publishers grip on preprint servers is a great example - those servers are hard to use as a reader, lack APIs and access to data, are not innovative or interacting with independent services. The community should make the rules and then publishers abide by and within them. Currently the publishers make all of the rules and dominate. Indeed, this is possibly the biggest ommision from this article - the total dominance of publishers across the entire ecosystem. You can't talk about change without highlighting that the publishers don't just own journals but the reference managers, the assessment systems, the databases etc. I may be an outlier on this point but for all of the people I interact with (often those at the bottom of the ladder) this is a strong feeling. Again, not a suggestion for anything to change and indeed the point of an op-ed is to stimulate thought and discussion so dissent is positive.
Note that these annotations were made in hypothes.is and are available here, linked in-text for ease - comments are duplicated in this review.
-
Summary of the essay
In this essay, the author seeks to explain the ‘firehose’ problem in academic research, namely the rapid growth in the number of articles but also the seemingly concurrent decline in quality. The explanation, he concludes, lies in the ‘superstructure’ of misaligned incentives and feedback loops that primarily drive publisher and researcher behaviour, with the current publish or perish evaluation system at the core. On the publisher side, these include commercial incentives driving both higher acceptance rates in existing journals and the launch of new journals with higher acceptance rates. At the same time, publishers seek to retain reputational currency by maintaining consistency and therefore brand power of scarcer, legacy-prestige journals. The emergence of journal cascades (automatic referrals from one journal to another journal within the same publisher) and the introduction of APCs (especially for special issues) also contribute to commercial incentives driving article growth. On the researcher side, he argues that there is an apparent demand from researchers for more publishing outlets and simultaneous salami slicing by researchers because authors feel they have to distribute relatively more publications among journals that are perceived to be of lower quality (higher acceptance rates) in order to gain equivalent prestige to that of a higher impact paper. The state of peer review also impacts the firehose. The drain of PhD qualified scientists out of academia, compounded by a lack of recognition for peer review, further contributes to the firehose problem because there are insufficient reviewers in the system, especially for legitimate journals. Moreover, what peer review is done is no guarantee of quality (in highly selective journals as well as ‘predatory’). One of his conclusions is that there is not just a crisis in scholarly publishing but in peer review specifically and it is this crisis that will undermine science the most. Add AI into the mix of this publish or perish culture, and he predicts the firehose will burst.
He suggests that the solution lies in researchers taking back power themselves by writing more but ‘publishing’ less. By writing more he means outputs beyond traditional journal publications such as policy briefs, blogs, preprints, data, code and so on, and that these should count as much as peer-reviewed publications. He places special emphasis on the potential role of preprints and on open and more collegiate preprint review acting as a filter upstream of the publishing firehouse. He ends with a call for more collegiality across all stakeholders to align the incentives and thus alleviate the pressure causing the firehose in the first place.
General Comment
I enjoyed reading the essay and think the author does a good job of exposing multiple incentives and competing interests in the system. Although discussion of perverse incentives has been raised in many articles and blog posts, the author specifically focuses on some of the key commercial drivers impacting publishing and the responses of researchers to those drivers. I found the essay compellingly written and thought provoking although it took me a while to work through the various layers of incentives. In general, I agree with the incentives and drivers he has identified and especially his call for stakeholders to avoid polarization and work together to repair the system. Although I appreciate the need to have a focused argument I did miss a more in-depth discussion about the equally complex layers of incentives for institutions, funders and other organisations (such as Clarivate) that also feed the firehose.
I note that my perspective comes from a position of being deeply embedded in publishing for most of my career. This will have also impacted what I took away from the essay and the focus of my comments below.
Main comments
-
I especially liked the idea of a ‘superstructure’ of incentives as I think that gives a sense of the size and complexity of the problem. At the same time, by focusing on publisher incentives and researchers’ response to them he has missed out important parts of the superstructure contributing to the firehose, namely the role of institutions and funders in the system. Although this is implicit, I think it would have been worth noting more, in particular:
-
He mentions institutions and the role of tenure and promotion towards the end but not the extent of the immense and immobilizing power this wields across the system (despite initiatives such as DORA and CoARA).
-
Most review panels (researchers) assessing grants for funders are also still using journal publications as a proxy for quality, even if the funder policy states journal name and rank should not be used
-
Many Institutions/Universities still rely on number and venue of publications. Although some notable institutions are moving away from this, the impact factor/journal rank is still largely relied on. This seems especially the case in China and India for example, which has shown a huge growth in research output. Although the author discusses the firehose, it would have been interesting to see a regional breakdown of this.
-
Libraries also often negotiate with publishers based on volume of articles – i.e they want evidence that they are getting more articles as they renegotiate a specific contract (e.g. Transformative agreements), rather than e.g. also considering the quality of service.
-
Institutions are also driven by rankings in a parallel way to researchers being assessed based on journal rank (or impact factor). How University Rankings are calculated is also often opaque (apart from the Leiden rankings) but publications form a core part. This further incentivises institutions to select researchers/faculty based on the number and venue of their publications in order to promote their own position in the rankings (and obtain funding)
-
-
The essay is also about power dynamics and where power in the system lies. The implication in the essay is that power lies with the publishers and this can be taken back by researchers. Publishers do have power, especially those in possession of high prestige journals and yet publishers are also subject to the power of other parts of the system, such as funder and institutional evaluation policies. Crucially, other infrastructure organisations, such as Clarivate, that provide indexing services and citation metrics also exert a strong controlling force on the system, for example:
-
Only a subset of journals are ever indexed by Clarivate. And funders and Institutions also use the indexing status of a journal as a proxy of quality. A huge number of journals are thus excluded from the evaluation system (primarily in the arts and humanities but also many scholar-led journals from low and middle income countries and also new journals). This further exacerbates the firehose problem because researchers often target only indexed journals. I’d be interested to see if the firehose problem also exists in journals that are not traditionally indexed (although appreciate this is also likely to be skewed by discipline)
-
Indexers also take on the role of arbiters of journal quality and can choose to delist or list journals accordingly. Listing or delisting has a huge impact on the submission rates to journals that can be worth millions of dollars to a publisher, but it is often unclear how quality is assessed and there seems to be a large variance in who gets listed or not.
-
Clarivate are also paid large fees by publishers to use their products, which creates a potential conflict of interest for the indexer as delisting journals from major publishers could potentially cause a substantial loss of revenue if they withdraw their fees. Also Clarivate relies on publishers to create the journals on which their products are based which may also create a conflict if Clarivate wishes to retain the in-principle support of those publishers.
-
The delisting of elife recently, even though it is an innovator and of established quality, shows the precariousness of journal indexing.
-
-
All the stakeholders in the system seem to be essentially ‘following the money’ in one way or another – it’s just that the currency for researchers, institutions, publishers and others varies. Publishers – both commercial and indeed most not-for profit - follow the requirements of the majority of their ‘customers’ (and that’s what authors, institutions, subscribers etc are in this system) in order to ensure both sustainability and revenue growth. This may be a legacy of the commercialisation of research in the 20th Century but we should not be surprised that growth is a key objective for any company. It is likely that commercial players will continue to play an important role in science and science communication; what needs to be changed are the requirements of the customers.
-
The root of the problem, as the author notes, is what is valued in the system, which is still largely journal publications. The author’s solution is for researchers to write more – and for value to be placed on this greater range of outputs by all stakeholders. I agree with this sentiment – I am an ardent advocate for Open Science. And yet, I also think the focus on outputs per se and not practice or services is always going to lead to the system being gamed in some way in order to increase the net worth of a specific actor in the system. Preprints and preprint review itself could be subject to such gaming if value is placed on e.g. the preprint server or the preprint-review platform as a proxy of preprint and then researcher quality.
-
I think the only way to start to change the system is to start placing much more value on both the practices of researchers (as well as outputs) and on the services provided by publishers. Of course saying this is much easier than implementing it.
Other comments
-
A key argument is that higher acceptance rates actually create a perverse incentive for researchers to submit as many manuscripts as possible because they are more likely to get accepted in journals with higher acceptance rates. I disagree that higher acceptance rates per se are the main incentive for researchers to publish more. More powerful is the fact that those responsible for grants and promotion continue to use quantity of journal articles as a proxy for research quality.
-
Higher acceptance rates are not necessarily an indicator of low quality or a bad thing if it means that null, negative and inconclusive results are also published
-
The author states that Journal Impact Factors might have been an effective measure of quality in the past. I take issue with this because the JIF has, as far as I know, always been driven by relatively few outliers (papers with very high citations) and I don’t know of evidence to show that this wasn’t also true in the past. It also makes the assumption that citations = quality.
-
The author asks at one point “Why would field specialization need a lower threshold for publication if the merits of peer review are constant? ” I can see a case for lower thresholds, however, when the purpose of peer review is primarily to select for high impact, rather than rigour, of the science conducted. A similar case might be made for multidisciplinary research, where peer reviewers tend to assess an article from their discipline’s perspective and reject it because the part that is relevant to them is not interesting enough… Of course, this all points to the inherent problems with peer review (with which I agree with the author)
-
The author puts his essay in appropriate context, drawing on a range of sources to support his argument. I particularly like that he tried to find source material that was openly available.
-
He cites 2 papers by Bjoern Brembs to substantiate the claim that there is potentially poorer review in higher prestige journals than in lower ranked journals. These papers were published in 2013 and 2018 and the conclusions relied, in part, on the fact that higher ranked journals had more retractions. Apart from a potential reporting bias, given the flood of retractions across multiple journals in more recent years, I doubt this correlation now exists?
-
The author works out submission rates from the published acceptance rates of journals. The author acknowledges this is only approximate and discusses several factors that could inflate or deflate it. I can add a few more variables that could impact the estimate, including: 1) the number of articles a publisher/journal rejects before articles are assigned to any editor (e.g. because of plagiarism, reporting issues or other research integrity issues), 2) the extent to which articles are triaged and rejected by editors before peer review (e.g. because it is out of scope or not sufficiently interesting to peer review); the number of articles rejected after peer review; and 4) the extent to which authors independently withdraw an article at any stage of the process. When publishers publish acceptance rates, they don’t make it clear what goes into the numerator or the denominator and there are no community standards around this. The author rightly notes this process is too opaque.
Catriona J. MacCallum
As is my practice, I do not wish to remain anonymous. Please also note that I work for a large commercial publisher and am writing this review in an independent capacity such that this review reflects my own opinion, which are not necessarily those of my employer.
-
-
This is a well written and clear enough piece that may be helpful for a reader new to the topic. To people familiar with the field there is not so much which is new here. The final recommendation is not well expressed. As currently put it is, I think, wrong. But it is a provocative idea. I comment section by section below.
The first paragraphs repeat well established facts that there are too many papers. Seppelt et al’s contribution is missing here. It also reproduces the disengenuous claim, by a publisher’s employee, that publishers ‘only’ respond to demand. I do not think that is true. They create demand. They encourage authors to write and submit papers, as anyone who has been emailed by MDPI recently can testify. Why repeat something which is so inaccuate?
The section on ‘upstream of the nozzle’ is rather confusing. I think the author is trying to establish if more work is being submitted. But this cannot be deduced from the data presented. No trends are given. Rejection rates will be a poor guide if the same paper is being rejected by several journals. I was also confused by the sources used to track growth in papers – why not just use Dimensions data? The final paragraph again repeats well known facts about the proliferation of outlets and salami slicing. Thus far the article has not introduced new arguments.
Minor points in this section:
-
there are some unsupported claims. Eg ‘This is a practice that is often couched within the seemingly innocuous guise of field specialty journals.’
-
I also do not understand the logic of this rather long sentence: ‘The expansion of journals with higher acceptance rates alters the rational calculus for researchers - all things being equal higher acceptance rates create a perverse incentive to submit as many manuscripts as possible since the underlying probability of acceptance is simply higher than if those same publications were submitted to a journal with a lower acceptance rate, and hence higher prestige.’ I suggest it be rephrased
The section on peer review (Who’s testing the water) is mostly a useful review of the issues. But there are some problems which need addressing. Bizarrely, when discussing whether there enough scientists, it fails to mention Hanson et al’s global study, despite linking to it’s preprint in the opening lines. Instead the author adopts a parochial North American approach and refers only to PhDs coming from the US. It is not adequate to take trends in one country to cannot explain an international publishing scene. These are not the ‘good data’ the author claims. Likewise the value of data on doctorates not going onto a post-doc hinges on how many post-docs there are. That trend is not supplied. This statement ‘Almost everyone getting a doctorate goes into a non-university position after graduation’ may be true, but no supporting data are supplied to justify it. Nor do we know what country, or countries, the author is referring to.
The section ‘A Sip from the Spring’ makes the mistaken claim that researchers hold market power. This is not true. Researchers institutions, their libraries and governments are the main source of publisher income. It is here that the key proposal for improvement is made: researcher can write more and publish less. But if the problem is that there is too much poorly reviewed literature then this cannot be the solution. Removing all peer review, would mean there is even more material to read whose appearance is not slowed up by peer review at all. If peer review is becoming inadequate, evading it entirely is hardly a solution.
This does not mean we should not release pre-prints. The author is right to advocate them, but the author is mistaken to think that this will reduce publishing pressures. The clue is in their name ‘pre-print’. Publication is intended.
Missing from the author’s argument is recognition of the important role that communities of researchers form, and the roles that journals play in providing venues for conversation, disagreement and disucssion. They provide a filter. Yes researchers produce other material than publications as the author states: ‘grant proposals, editorials, policy briefs, blog posts, teaching curricula and lectures, software code and documentation, dataset curation, and labnotes and codebooks.’ I would add email and whatsapp messages to that list. But adding all that to our reading lists will not reduce the volume of things to be read. It must increase it. And it would make it harder to marshall and search all those words.
But the idea is provocative nonetheless. Running through this paper, and occasionally made explicit, is the fact that publishers earn billions from their ‘service’ to academia. They have a strong commercial interest in our publishing more, and in competing with each other to produce a larger share of the market. If writing more, and publishing less, means we need to find ways of directing our thoughts so that they earn less money for publishers, then that could bring real change to the system.
A minor point: the fire hosre analogy is fully exploited and rather laboured in this paper. But it is a North American term and image, that does not travel so easily.
-
-
Authors:
- Christopher Marcum christopher.steven.marcum@gmail.com
-
-
10.54900/r8zwg-62003
-
Drinking from the Firehose? Write More and Publish Less
-
-
-
The author uses 12 previously reported estimates from studies that focus on different research quality characteristics and construct samples from different literatures to estimate that approximately 1 in 7 scientific papers are “fake.” All three reviewers, however, call into question the estimate’s accuracy, and the article itself notes reasons to be skeptical. Even setting aside the intrinsic difficulties given the available evidence, the article does not use a systematic or rigorous method to compute the reported estimate. Thus, the reported estimate could be overstated or understated. The author also argues that the proportion of scientific outputs that are fake is a more relevant statistic than the oft-cited percentage of scientists who admit to faking or plagiarizing (Fanelli, 2009). The author calls for better recognition of the problem and better funding so that metaresearchers can conduct large-scale studies capable of producing more reliable overall estimates. The reviewers noted some strengths. For example, two reviewers noted that the research question is important and that updated estimates are needed. One reviewer noted the importance of understanding the increase in the percentage of fake scientific outputs given changes in available technology helpful in committing fraud and found the estimate of 1 in 7 urgently concerning despite its roughness. The reviewers also point to weaknesses. Reviewer 1 worries that no published estimate tells us much about the overall proportion of fake studies. This reviewer proposes that the author take a different approach by determining which data are needed to accurately estimate the proportion, collecting that data, and using it to compute a reliable estimate. This reviewer also suggests adding references to support claims made throughout the article. The second co-authored review report notes three concerns. First, the co-reviewers emphasize that the author calls his own claims into question. Second, they argue that the author is incorrect in claiming that his article is “in opposition” to Fanelli (2009) because both articles fail to provide a reliable estimate of the amount of scientific output that is fake. Finally, the co-reviewers draw inferences from a dataset they constructed to argue that the author is incorrect in his characterizations of how others have interpretated Fanelli (2009). Reviewer 3 notes that the author’s focus on articles rather than scientists deemphasizes the important human dimension of fakery. This reviewer suggests emphasizing reputational harm caused by false positives. In sum, all three reviewers are unpersuaded by the author’s claim that approximately 1 in 7 scientific papers are fake.
Recommendations from the Editor
The value of the article lies not in its too-roughly calculated estimate but in its attempt to highlight both an important yet unanswered question and the difficulties that hinder our ability to reliably answer it. The article also provides a useful summary of the bourgeoning literature and the challenges of drawing broad inferences from it. The author should consider highlighting these points rather than the rough estimate of the rate of falsification and fabrication. The author should change the article’s title to reflect the skepticism about the estimate that runs throughout the article so as not to confuse readers about what we can reliably take away from the article.
The following are specific suggestions:
-
Adding reference or links to Table 1 would help readers find details related to the listed items.
-
p. 6 (“The following (Table 1) is a selection of events which took place after the figure above was established.”): Clarify why 2005 is the first year of interest in the events table (e.g., change sentence to “The following (Table 1) is a selection of events that took place during or after 2005, the final year of publication of the studies Fanelli used to compute the 2% figure.”).
-
p. 7 (“Significantly, all of the above happened after the figure of 2% was collected.”): change to “… after publication of the studies on which the 2% figure is based.”
-
The link in footnote 5 no longer works. The report can be found at https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1262&context=scholcom. Consider changing all links to permalinks.
-
p. 16: list the 29 observations of data sleuth estimates in a footnote.
-
Footnote 6: the link pulls up the Retraction Watch Database for Nature Publishing Group. I accessed the link on Jan 15, 2025, and the database found 1,610 items. It’s not clear how you computed 667 (12,000 / 667 = 18). If “Retracted Article” is chosen for “Article Type(s),” the count is 1.
-
p. 18 (“the presumably higher number of papers containing questionable research practices (which are far more commonly admitted to) is presumably higher still.”): Consider citing to published estimates, which are mostly produced using surveys, and note that this literature suffers from the same problems as the literature that estimates FF.
-
p. 18: add citations to articles that address each of the harms caused by false results.
-
Bottom of p. 19 (“In particular, it seems likely that FF rates change by individual field – in doing so, they may present specific rather than general threats to human health and scientific progress.”): Providing some explanation for field-specific rates might help the reader assess the claim. For example, it’s possible that rates are similar across fields because those willing to commit fraud or to fabricate data likely randomly distribute themselves across fields, and journal editors and referees are roughly equally likely to fail to detect falsification and fabrication. Does any evidence call these possibilities into question?
-
-
This manuscript attempts to provide an answer to the proportion of scientific papers that are fake. The presence of fake scientific papers in the literature is a serious problem, as the author outlines. Papers of variable quality and significance will inevitably be published, but most researchers assess manuscripts and papers based on the assumption that the described research took place. Papers that disguise their identities as fake papers can therefore be highly damaging to research efforts, by preventing accurate assessments of research quality and significance, and by encouraging future research that could consume time and other resources. As the manuscript describes, fake papers are also damaging to science by eroding trust in the scientific method and communities of scientists.
It is therefore clear that knowing the proportion of fake scientific papers is important, that the author is concerned about the problem, and that the author wants to arrive at an answer. However, as the manuscript partly recognises, the question of the overall proportion of fake scientific papers is currently difficult to answer.
The overall proportion of fake papers in science will represent the individual proportions of fake papers in different scientific disciplines. In turn, the proportions of fake papers in any single discipline will reflect many factors, including (i) researcher incentives to produce fake papers, (ii) the ease with which fake papers can be produced and (iii) published, (iv) the ease or likelihood of fake papers being detected, before or (v) after publication, and (vi) the consequences for authors if they are found to have published fake papers. Some of these factors are likely to vary between different disciplines and in different research settings. For example, it has been suggested that it is similarly difficult to invent some research results as it is to produce genuine data. However, in other fields, it is easier to invent data than to generate data through experiments that remain difficult, expensive and/or slow. It is also likely that factors such as the capacity to invent fake papers, detect fake papers, as well as incentives and consequences for researchers could vary over time, particularly in response to generative AI.
As someone who studies errors in scientific papers, I don’t believe that we currently have a good understanding of the proportions of fake papers in any individual scientific field, at any time. There are some fields where we have estimates of individual error types, but these error types are likely to wrongly estimate the overall proportions of fake papers. Rather than attempting to answer the question of the overall proportion of fake scientific papers in the absence of the necessary data, it seems preferable to describe how we could obtain the data that we need to answer this question. While the overall proportion of fake scientific papers is an important statistic, most scientists will also be more concerned about how many fake papers exist in their own fields. We could therefore start by trying to obtain reliable estimates of fake papers in individual fields, working out how we need to do this, and then carrying out the necessary research. In the absence of reliable data, it’s perhaps most important that researchers are aware that fake papers could exist in their fields, so that all researchers can assess papers more carefully.
Beyond these broad considerations, the following manuscript elements could be reconsidered.
-
Fake science is defined as fabricated or falsified, yet this definition is sometimes expanded to include plagiarism (page 8, Table 2). However, plagiarism doesn’t equate with faking or falsifying data, and some plagiarised articles could describe sound data. Including plagiarised articles as fake articles will inevitably inflate estimates of fake papers, particularly in fields with higher rates of plagiarism.
-
Table 1 was stated to represent “a selection of events that took place after the figure above (ie the figure published by Fanelli (2009)) was established”, yet some listed references/ events were published/ occurred between 2005 and 2008.
-
It is reasonable to expect that increased capacity to autogenerate text and images will increase the numbers of fake papers, but I’m not aware of any evidence to support this. No reference is cited.
-
Table 2; “similar survey results”: it’s not clear how the listed studies are similar.
-
There are many unreferenced statements, eg page 9, “most rejected papers are published, just elsewhere”, page 19.
-
Some estimates of fake papers arise from small sample sizes (eg page 13).
-
The statement “The accumulation of papers assembled here is, frankly, haphazard” doesn’t inspire confidence in the resulting estimate.
-
“…it would be prudent to immediately reproduce the result presented here as a formal systematic review”- any systematic review seems premature without reliable estimates.
-
“The false positive rate (FPR) of detecting fake science is almost certainly quite low”- this seems unlikely to be correct. False positive rates depend on the methods used. Different methods will be required to detect fake papers in different disciplines, and these different methods could have very different false positive rates, particularly when comparing the application of manual versus automated methods that are applied without manual checking.
-
Page 2: I could not see the n=12 studies summarised in a single Table.
-
Page 10: “All relevant studies were included”…. “The list below is comprehensive but not necessarily exhaustive”- these statements contradict each other.
Disclosure: Jennifer Byrne receives NHMRC grant funding to study the integrity of molecular cancer research publications.
-
-
The provocative essay written by James Heathers is a genuine attempt to quantify the current prevalence of two growing research malpractices, namely fabrication and falsification (FF for short), which are universally recognized as gross misconducts. The matter is of interest not only to researchers themselves (including meta-scientists), but also to general audiences, since taxpayers have a natural right to oversee the rewards of Science for the society at large. The underlying assumption of the author is that the generally accepted figure of 2% of researchers involved at least once in FF should now be considered as a lower bound. This 2% rate appeared in an article authored by Daniele Fanelli in 2009, and made an impact in the scholarly community. However, a lot of water has flowed under the bridge since then, and new actors showed up: papermills, sophisticated digital tools (intended for both data fabrication and FF tracking), whistleblowers communicating via social networks, generative artificial intelligence, etc. The update proposed by James Heathers is thus certainly welcome.
The other premise of the author is that the assessment of the proportion of faking scientists is not a suitable proxy. Instead, he preferred to address a tangential issue: the estimation of the rate of scholarly papers including fabricated or falsified data. According to the author, such an approach has more benefits than drawbacks, and could be, from an idealistic point of view, fully automated. One could agree, although the fear of seeing the building of an Orwellian machinery is never far away. At the end of the process, offending papers are retracted (assuming, again, an ideal world), while the authors of the flagged papers are jailed (metaphorically or not).
A survey of more recent studies was thus carried out. Although the author acknowledges that the small sample size for his study (N = 12), as well as the large dispersion of FF estimates retrieved from this corpus, do not allow a proper meta-analysis, an alarming figure of 14.3% for the updated FF rate emerges. Moreover, this figure is consistent with independent data reported by other sleuths engaged in the fight against questionable research practices, which are mentioned in the “discussion” section of the paper. Even if estimated in a rough way, the increase of FF in less than 15 years, if confirmed by other studies, is a real threat to Science, and should be addressed urgently.
The main value of this essay is thus to raise concerns about the fast growth of FF, rather than to provide an up-to-date FF rate, which is anyway probably impossible to obtain in a reliable manner. On the other hand, an obvious weakness of the study is the chosen target: by focusing his attention on papers, James Heathers is missing the human dimension of the academic endeavour. Indeed, authors and papers are entangled bodies, and like entangled particles, they are described by a single state involving both entities: a paper does not exist without authors, and authors are invisible if they do not publish on a regular basis.
Nowadays, scientific papers are extremely complex, and almost always impenetrable to researchers outside of the involved field. However, Homo academicus (as coined by Pierre Bourdieu) is also a very complex being. This is why, despite there is an unambiguous definition for FF, the false positive and negative rates of detecting FF are unknown, as recognized by James Heather. In particular, false positive detections can be detrimental to authors. This point is mentioned en passant in the essay, but should be emphasized: it is more than just a drawback of the used methodology, since it is related to the very human dimension of the scholarly enterprise.
Perhaps a complementary perspective of the work carried out by James Heathers could be based on the following example: James Ibers (1930-2021), an old-school chemist and influential crystallographer, wrote a memoir published by the American Crystallographic Association, shortly before his death.1 He describes how, as a freshman at Caltech, he attended a mandatory one-week orientation workshop. In his own words: “The most important message I took away was the Caltech Honor Code for all undergraduates. In its simplest terms: You can’t cheat in Science because you will eventually be found out. I have adhered to that Code as a husband, a father, a scientist, a teacher, a research director, and all others I have dealt with”. How many of us can ensure, without hesitation, that they stand next to Ibers? What is the tolerable threshold of cheaters in Science? 2%? 14.3%? More?
James Heathers ends his article with a worrying sentence: “Priorities must change, or science will start to die”. Perhaps, however, Science is already as dead as a dodo.
1 https://chemistry.northwestern.edu/documents/people/james_ibers.aca.memoir.2020.pdf
Declaration of competing interest. The author has no conflicts of interest to disclose.
-
Review written in collaboration with Maha Said (Orcid) and Frederique Bordignon (Orcid)
The title of the article makes a simple striking claim about the state of the scientific literature with a numerical estimate of the proportion of “fake” articles. Yet, by contrast to this title, in the text of the article, Heathers is highly critical of his own work.
James’ peer review of Heathers’ article
James Heathers often mentions the limitations of his research thus “peer-reviewing” his own article to the extent that he admits that this work is “incomplete”, “unsystematic” and “far flung”.
“This work is too incomplete to support responsible meta-analysis, and research that could more accurately define this figure does not exist yet. ~1 in 7 papers being fake represents an existential threat to the scientific enterprise.”
“While this is highly unsystematic, it produced a substantially higher figure. Correspondents reliably estimated 1-5% of all papers contain fabricated data, and 2-10% contain falsified results.”
“These values are too disparate to meta-analyze responsibly, and support only the briefest form of numerical summary: n=12 papers return n=16 individual estimates; these have a median of 13.95%, and 9 out of 16 of these estimates are between 13.4% and 16.9%. Given this, a rough approximation is that for any given corpus of papers, 1 in 7 (i.e. 14.3%) contain errors consistent with faking in at least one identifiable element.”
“The accumulation of papers collected here is, frankly, haphazard. It does not represent a mature body of literature. The papers use different methods of analyzing figures, data, or other features of scientific publications. They do not distinguish well between papers that have small problematic elements which are fake, or fake in their entirety. They analyze both small and large corpora of papers, which are in different areas of study and in journals of different scientific quality – and this greatly changes base rates;…”
“As a consequence, it would be prudent to immediately reproduce the result presented here as a formal systematic review. It is possible further figures are available after an exhaustive search, and also that pre registered analytical assumptions would modify the estimations presented.”
Heathers has also in an interview published in Retraction Watch (Chawla 2024) acknowledged pitfalls in this article such as:
“Heathers said he decided to conduct his study as a meta-analysis because his figures are “far flung.””
“They are a little bit from everywhere; it’s wildly nonsystematic as a piece of work,” he said.”
“Heathers acknowledged those limitations but argued that he had to conduct the analysis with the data that exist. “If we waited for the resources necessary to be able to do really big systematic treatments of a problem like this within a specific area, I think we’d be waiting far too long,” he said. “This is crucially underfunded.”
Built in opposition to Fanelli 2009, but it’s illogical
Heathers states in the abstract that his article is “in opposition” to Fanelli’s 2009 PloS One article (Fanelli 2009), yet that opposition is illogical and artificially constructed since there is no contradiction between 2% of scientists self-reporting having taking part in fabrication or falsification and an eventual much higher proportion of “fake scientific outputs”. Like most of what is wrong with Heather’s article, this is in fact acknowledged by the author who notes that the 2% figure “leaves us with no estimate of how much scientific output is fake” (bias in self-reporting, possibility of prolific authors, etc).
Fanelli 2009 is not cited in the way JH says it is cited
Whilst the opposition discussed above is illogical, it could be that the 2% figure is mis-cited by others as representing an estimate of fake scientific outputs thus probably underestimating the extent of fraud. Heathers suggests that this may indeed be the case, but also contradicts himself about how (Fanelli 2009), or the 2% figure coming from that publication, is typically used.
In one sentence, he writes that “the figure is overwhelmingly the salient cited fact in its 1513 citations” and that “this generally appears as some variant of “about 2% of scientists admitted to have fabricated, falsified or modified data or results at least once” (Frank et al. 2023)
whilst and in another sentence, he writes that “the typical phraseology used to express it – e.g. “the most serious types of misconduct, fabrication and falsification (i.e., data fraud), are relatively rare” (George 2016).
Those two sentences cited by Heathers are fundamentally different, the first one accurately reports that the 2% figure relates to individuals self-reporting, whilst the second one appears to relate to the prevalence of misconducts in the literature itself. How Fanelli 2009 is cited in the literature is an empirical question that can be studied by looking at citation contexts beyond the two examples given by Heathers. Given that a central justification for Heathers’ piece appears to be the misuse of this 2% figure, we sought to test whether this was the case.
A first surprise was that whilst the sentence attributed to (George 2016) can indeed be found in that publication (in the abstract), first it is not in a sentence citing (Fanelli 2009) nor the 2% figure, and, second, it is quoted selectively omitting a part of the sentence that nuances it considerably: “The evidence on prevalence is unreliable and fraught with definitional problems and with study design issues. Nevertheless, the evidence taken as a whole seems to suggest that cases of the most serious types of misconduct, fabrication and falsification (i.e., data fraud), are relatively rare but that other types of questionable research practices are quite common.” (Fanelli 2009) is discussed extensively by (George 2016), and some of the caveats, e.g. on self-reporting, are highlighted.
To go beyond those two examples, we constructed a comprehensive corpus of citation contexts, defined as the textual environment surrounding a paper's citation, including several words or sentences before and after the citation (see Methods section below). 737 citation contexts could be analysed. Out of those, the vast majority (533, or 72%) did not cite the 2% figure. Instead, they often referred to this article as a general reference together with other articles to make a broad point, or, focused on other numbers in particular those related to questionable research practices (Bordignon, Said, and Levy 2024). The 28% (204) citation contexts that did mention the 2% figure did so accurately in the majority of cases: 83% (170) of those did mention that it was self-reporting by scientists whilst 17% (34) of those, or 5% of the total citation contexts analysed were either ambiguous or misleading in that they suggested or claimed that the 2% figure related to scientific outputs.
Although the analysis above does not include all citation contexts, it is possible to conclude unambiguously that the 2% figure is not overwhelmingly the salient cited fact in relation to Fanelli 2009, and that when it is cited it is often accurately, i.e. as representing self-reporting by scientists. Whilst an exhaustive analysis is beyond the scope of this peer review, it is not uncommon to find in this corpus citations contexts that have an alarming tone about the seriousness of the problem of FFPs, e.g. “…a meta-analysis (Fanelli 2009) suggest that the few cases that do surface represent only the tip of a large iceberg." [DOI: 10.1177/0022034510384627]
Thus, the rationale for Heathers’ study appears to be misguided. The supposed lack of attention for the very serious problem of FFPs is not due to a minimisation of the situation fueled by a misinterpretation of Fanelli 2009. Importantly, even if that was the case, an attempt to draw attention by claiming that 1 in 7 papers are fake, a claim which according to the author himself is not grounded in solid facts, is not how the scientific literature should be used.
Methods for the construction of the corpus of citation contexts
We used Semantic Scholar, an academic database encompassing over 200 million scholarly documents from diverse sources including publishers, data providers, and web crawlers. Using the specific paper identifier for Fanelli's 2009 publication (d9db67acc223c9bd9b8c1d4969dc105409c6dfef), we queried the Semantic Scholar API to retrieve available citation contexts. Citation contexts were extracted from the "contexts" field within the JSON response pages, (see technical specifications).
The query looks like this: semanticscholar.org
The broad coverage of Semantic Scholar does not imply that citation contexts are always retrieved. The Semantic Scholar API provided citation contexts for only 48% of the 1452 documents citing the paper. To get more, we identified open access papers among the remaining 52% citing papers, retrieved their PDF location and downloaded the files. We used Unpaywall API, which is a database to be queried with a DOI in order to get open access information about a document. The query looks like this.
We downloaded 266 PDF files and converted them to text format using an online bulk PDF-to-text converter. These files were then processed using TXM, a specialized textual analysis tool. We used its concordancer function to identify the term "Fanelli" as a pivot term and check the reference being the good one (the 2009 paper in PlosOne). We did manual cleaning and appended the citation contexts to the previous corpus.
Through this comprehensive methodology, we ultimately identified 824 citation contexts, representing 54% (784) of all documents citing Fanelli's 2009 paper. This corpus comprised 48% of contexts retrieved from Semantic Scholar and an additional 6% obtained through semi-manual extraction from open access documents. 87 of those contexts were excluded from the analysis for a range of reasons including: context too short to conclude, language neither English nor French (shared languages of the authors of this review), duplicate documents (e.g. preprints), etc, leaving us with 737 contexts. They were first classified manually in two categories, those mentioning the 2% figure and those which did not. Then, for the first category, they were further classified manually in two categories depending on whether the figure was appropriately assigned to self-reporting of researchers or rather misleadingly suggesting that the 2% applied to research outputs.
The reviewers have no competing interests to declare.
Contributions
Investigation: FB collected the citation contexts.<br /> Data curation and formal analysis: RL and MS<br /> Writing – review & editing: RL, MS and FB
References
Bordignon, Frederique, Maha Said, and Raphael Levy. 2024. “Citation Contexts of [How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data, DOI: 10.1371/Journal.Pone.0005738].” Zenodo. https://doi.org/10.5281/zenodo.14417422.
Chawla, Dalmeet Singh. 2024. “1 in 7 Scientific Papers Is Fake, Suggests Study That Author Calls ‘Wildly Nonsystematic.’” Retraction Watch (blog). September 24, 2024. https://retractionwatch.com/2024/09/24/1-in-7-scientific-papers-is-fake-suggests-study-that-author-calls-wildly-nonsystematic/.
Fanelli, Daniele. 2009. “How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data.” PLOS ONE 4 (5): e5738. https://doi.org/10.1371/journal.pone.0005738.
Frank, Fabrice, Nans Florens, Gideon Meyerowitz-Katz, Jérôme Barriere, Éric Billy, Véronique Saada, Alexander Samuel, Jacques Robert, and Lonni Besançon. 2023. “Raising Concerns on Questionable Ethics Approvals - a Case Study of 456 Trials from the Institut Hospitalo-Universitaire Méditerranée Infection.” Research Integrity and Peer Review 8 (1): 9. https://doi.org/10.1186/s41073-023-00134-4.
George, Stephen L. 2016. “Research Misconduct and Data Fraud in Clinical Trials: Prevalence and Causal Factors.” International Journal of Clinical Oncology 21 (1): 15–21. https://doi.org/10.1007/s10147-015-0887-3.
-
Authors:
- James Heathers jamesheathers@gmail.com
-
2
-
-
10.17605/OSF.IO/5RF2M
-
Approximately 1 in 7 Scientific Papers Are Fake
-
- Feb 2025
-
-
The paper focuses on two major issues, the “credibility crisis” in Psychology and open science practices, and argues that the two could have a synergistic relationship in Africa, with moves to improve reproducibility and integrity in Psychology benefiting from and contributing to developments to make science more accessible and transparent. Three reviewers assessed the article. The article was considered by all reviewers to be a well-written overview of different dimensions of the “credibility crisis” in Psychology and ways it is being addressed, and of the open science movement, particularly in Africa. The analysis providing a taxonomy of open science developments (organised by accessibility, infrastructure, credibility, and community) and the commentary on how the development of open science can be facilitated in African contexts are considered strengths of the article. Areas where the article could be improved were identified, including that the argument about the mutual benefits that could accrue in African contexts between addressing challenges associated with the “credibility crisis” and encouraging open science could be sharpened and made more specific to Africa. Practical ways in which researchers could adopt open science practices, and how such developments might be measured, could also be improved, with specific comments made about bringing diamond open access into the analysis.
-
A summary of what the authors were trying to achieve (address the entire article, not just individual points or sections)
This short paper provides an introduction to two issues: the “credibility revolution” of practices in psychology research following the reproducibility crisis, and the state of psychology research in Africa and the factors which are crucial to its development. The paper claims that there are mutual benefits: efforts to support credible and accessible research can benefit psychology in Africa, African psychology can expand and enhance the credibility of psychology research in the rest of the world.
An account of the major strengths and weaknesses of the conceptual framework, methods and results
The paper serves as a very accessible introduction to the “reproducibility crisis” in psychology, and the subsequent “credibility revolution” of research practices which are often (but not exclusively) focussed on transparency and accessibility, and which are applicable in many fields beyond psychology.
The taxonomy of open science innovations into four categories - Accessibility, Infrastructure, Credibility, Community - is a nice way of organising initiatives.
It may be beyond the scope of the remit the authors set themselves, but from a metascience perspective there is an unanswered question of how progress on the challenges set out by the paper would be measured. What are the indicators which we could use to evaluate progress in the different challenge areas or against which to measure benefits ?
One paragraph summarising progress on reproducibility (p8 “The result was an explosion of research on practices to improve the credibility of psychology research”) seems to imply that credibility efforts are coextensive with replication studies (which is surely not what the authors mean) and further to imply that credibility practices are limited in applicability to a restricted domain of mostly online studies (which undersells the benefits of the credibility practices developed within psychology and admirably showcased in this paper).
An appraisal of whether the authors achieved their aims, and whether the results support their conclusions
I am not qualified to comment on whether the portrayal of African psychology is fair or comprehensive. I note that six of the nine authors have affiliations with African institutions.
A discussion of the potential likely impact of the article on the field, and the utility of the conceptual framework, methods and empirical materials/data to the community
The contribution of this paper is to signpost the valuable work that is being done on credibility mechanisms and on research development in Africa.
Any additional context that might help readers interpret or understand the significance of the article
None
Any issues the authors need to address about the availability of data, code, research ethics, or other issues pertaining to the adherence of the article to MetaROR’s publishing policies
N/A
Competing interests:
I am on the PsyArxix Scientific Advisory Board. PsyArxiv and preprint servers are mentioned positively by the article.
I am a member of the UK Reproducibility Network (UKRN) and chair of the UKRN Institutional Leads group. UKRN is a sister organisation to the African Reproducibility Network (AREN), which is mentioned positively by the article.
I am a senior research fellow at the Research on Research Institute (RoRI). RoRI supports the development of MetaRoR, which could be considered an overlay journal. Overlay journals are mentioned positively by the article.
Positionality:
As an experimental psychologist I have been involved in the discussion around credibility since at least 2011. I have no experience or familiarity with African psychology or the human development issues mentioned by the article.
Reviewers are asked to provide specific guidance on the following:
Does the article contribute new insights to the relevant fields?
Yes. Both topics - credibility in research and research in Africa - are huge topics. The brief introductions here are valuable and there is added benefit of bringing the two into explicit dialogue.
Are the key insights clearly communicated in the abstract, introduction, and conclusion?
Yes
Does the introduction section adequately explain necessary background information? Does it set out and justify the motivation for and aim of the study?
Yes
Does the literature review (where applicable) include the relevant research including the most recent research?
Yes, with the caveat that the topics are so large that it is impossible in this amount of space to be comprehensive.
Are any analytical concepts or theoretical frameworks used appropriately introduced and taken up in the empirical analysis (where applicable)?
N/A
Are all research methods clearly described and appropriate? In the case of quantitative submissions, are the methods rigorous and does the study include or point to all materials required to attempt a replication of the results?
N/A
Do the results make sense? Are they clearly formatted and presented? Are graphs easily readable and clearly labeled? Are all figures and tables understandable without reference to the main body of the article? Do they contain proper captions?
N/A
Are the results discussed in the context of previous findings? Are the results similar to previously reported findings? Are differences explained?
There is no mention of previous work on this exact topic (the synergy). Perhaps there isn’t any? Maybe explicit statement to this effect would be good
Are limitations of the study and their implications for interpretation of the results clearly described (where applicable)?
On a similar line, maybe readers would benefit from a statement from the authors on their backgrounds and/or how the author team came together to address this topic?
Are interpretations and conclusions consistent with the empirical materials and data?
N/A
Are all references appropriate? Are necessary references present? Are all references cited in the text included in the reference list?
Nosek et al (2021) is missing or should be Nosek et al (2022), which additionally appears slightly out of alphabetical order in the bibliography
If one or more studies in the article were preregistered, are the hypotheses, research methods, and inference criteria in line with the preregistration?
N/A
-
A summary of what the authors were trying to achieve (address the entire article, not just individual points or sections)
The area of expertise of the reviewer is open access which, for all intents and purposes, is a subset of open science. The nub of this manuscript is the credibility of psychology research using open science to grow that credibility and to bridge the psychology knowledge divide between the global north and Africa. Given the reviewers limited knowledge of the core issue of psychology research, the reviewer will confine his comments to his area of expertise.
An account of the major strengths and weaknesses of the conceptual framework, methods and results
Psychological science is facing a ‘credibility crisis’ because many studies can’t be replicated, prompting reforms for more transparent and rigorous research. While these efforts have progressed in North America and Europe, Africa has seen less impact due to challenges like low funding and poor infrastructure, widening the gap in research capacity. These are the core issues covered in this manuscript. However, what is omitted as a challenge is the brain drain and its impact on research and research production. Many authors talk about knowledge pilgrimage as African researchers have to research issues from a global north perspective to improve chances for publication. There is a nuanced difference between research colonialism (concept used by the authors) and knowledge pilgrimage – the former aligned to helicopter research while the latter is about African researchers manipulating global north research data to get published at the neglect of Africa.
The authors make the point about power imbalances and the fact that global north researchers use Africa as a point of data collection without recognising the contribution by the African researchers.
The research gap is widened given the reliance on North American and Europeans guidelines and standards. Open science practices offer African researchers tools to improve research quality and join global discussions. Initiatives like the African Reproducibility Network can help build stronger research communities, addressing local issues while contributing to a more inclusive and globally relevant psychology. Strengthening African research could also advance human development across the continent.
It is recommended that the authors do not use illegal entities as sources of information as, in the opinion of the reviewer, brings into question the credibility of the manuscript.
The perceived weaknesses relate to the oversimplification of open science solutions. While open science is framed as a key solution, the manuscript oversimplifies its implementation in Africa. It only superficially acknowledges the barriers to adopting open science practices, such as the lack of stable internet access, digital tools, and the necessary training in many African institutions. The manuscript suggests open science can bridge gaps without delving into the complexities of infrastructure and access that make it difficult for African researchers to fully engage with these practices.
The manuscript should pay a little more attention on how African researchers can practically adopt open science tools. While it mentions open-access platforms and reproducibility networks, it doesn’t provide details on how these can be integrated into the current research systems in Africa, given the resource constraints. Practical guidance on funding models, technical support, or training programs needed to implement open science would provide a more grounded solution. The use of diamond open access is an extremely viable model to grow the production of psychological research.
An appraisal of whether the authors achieved their aims, and whether the results support their conclusions
From an open science perspective, the authors have done relatively well to provide a roadmap for the improved accessibility and credibility of psychology research.
The authors highlight how open science practices, such as PsyArXiv and AfricArXiv, enhance research accessibility by allowing researchers to freely share their findings. This is critical in the context of the paywalls that restrict access to valuable research, particularly for researchers in low-resource settings. They also address the issue of inclusivity or equity. APCs is a new barrier. It is pointed out that many researchers in Africa face barriers due to high article processing fees associated with open access journals. This situation can perpetuate inequities in knowledge production and dissemination. What the authors have missed is the opportunity to investigate diamond open access as a viable alternative for the dissemination of African scholarship.
The preprints solution is a viable pathway for researchers to share their findings before formal publication. This can facilitate greater visibility for their work and allow for earlier feedback, although it’s essential to navigate journal policies carefully. The acknowledgement of overlay journals is important given that it is a relatively new concept. Instead of publishing papers directly, they provide peer review for papers that have already been posted on preprint servers. If the paper passes peer review, the overlay journal "publishes" the paper by linking to it on the preprint server. When one ventures into the arena of quality, this process helps build research production capacity. However, it must be noted that some overlay journals charge a fee for the peer review and publication process.
The other significant issue is that of research credibility. The authors discuss innovative tools and processes that enhance credibility. Initiatives such as registered reports and pre-registration of study protocols enhances the transparency and credibility of research. These methods can mitigate biases related to selective reporting of positive results, thus improving the overall integrity of psychological research. Community initiatives like that of Psychological Science Accelerator, underlines the importance of collaboration in enhancing research credibility. Engaging a diverse range of researchers can lead to more comprehensive studies and foster an environment of shared knowledge.
One of the major challenges is that of low skills levels. The authors bring into the discussion the issue of building capacity – the train-the trainer model is critical for an inclusive process. Further, to maximize the benefits of open science practices, there’s a clear need for training and capacity-building initiatives tailored to the African context. This would empower researchers with the skills and knowledge to effectively engage with these practices.
What is of concern to the reviewer is the indistinctive definition of predatory. The authors should stay away from aligning predatory with open access. The fact that the mode of delivery is electronic material must not be confused with a model for the delivery of predatory scholarship
In the main, the authors have achieved their goal of developing a roadmap for bridging the divide.
Reviewers are asked to provide specific guidance on the following:
Does the article contribute new insights to the relevant fields?
The manuscript provides a high level association between open science practices and driving human development in Africa. This is done through enhancing access to research, improving infrastructure, and boosting credibility. These are established which is confirmed by the authors highlight that platforms like AfricArXiv and OSF provide free access to research materials and tools, reduce barriers imposed by paywalls. There are other initiatives such as Registered Reports, the African Reproducibility Network that foster collaboration and skill-building.
The new insights that are brought forward via the linking of these established tools, practises to the African scenario. Empowering and bridging the divide are the new insights.
Are the key insights clearly communicated in the abstract, introduction, and conclusion?
The abstract should provide a little more detail on how open science and its relationship with bridging the knowledge divide. The introduction gives substantial grounding on what the readers can expect from the rest of the article. However, the conclusion does not pull the golden threads together. A little more can be done with the conclusion and bringing an association between the introduction and conclusion.
Do the results make sense? Are they clearly formatted and presented? Are graphs easily readable and clearly labeled? Are all figures and tables understandable without reference to the main body of the article? Do they contain proper captions?
The discussion is substantial. The two tables are clear and easily readable. What is commended is that the authors do not duplicate what is captured in the two tables.
Are any analytical concepts or theoretical frameworks used appropriately introduced and taken up in the empirical analysis (where applicable)?
There is no theoretical framework or a research methodology section. However, this may be a practise that the reviewer is accustomed to but not necessary. Be that as it may, it will help the manuscript if the authors could articulate the methodology.
Does the literature review (where applicable) include the relevant research including the most recent research?
The review of the literature is substantial. However, there are gaps (e.g. diamond open access) in the review of the literature with regard to more recent development with regard to dissemination of scholarship.
There are a number of very old references and more current references. The old references are important to ground the issue of (de)colonialization which is important for the major part of the discussion. More recent literature would give the authors an understanding of diamond which will help in sourcing alternatives.
-
The authors outline the developments of both North America and Europe as well as Africa, and how both parties can benefit from the progress and learnings of the other. However, the authors argue that this can only occur if a wide array of stakeholders invest in science in Africa, including resourcing, training, and development of research tools. I particularly enjoyed reading this article, as the authors outline how science – even in its most basic forms – in fact can and do impact population outcomes and development. It provides a reader naïve of the research situation in Africa, as well as one that is relatively naïve to open science, a concise summary of how these two can interact to yield better outcomes for both – suitable for a wide range of stakeholders.
A key strength of this article is that the authorship consists of experts involved in both fields that are being discussed i.e., open science, and science in Africa. Thus, providing a relatively balanced view of the situation. However, my one concern is that the presentation for the case of each side benefiting the other could be made stronger, particularly how Africa could benefit from open science and the credibility revolution occurring in North America and Europe. At the moment, it reads that the key emphasis is that Africa could benefit greatly from open science learning if it addresses key structural barriers, and that open science could benefit from Africa by expanding its research community. However, the latter is true for many regions of the world. As Africa is a diverse and rich continent in history and culture, I encourage the authors to examine further and investigate case studies or findings to strengthen this argument. Examples may include where Africa has excelled in transparency, collaboration, or other key tenets of good science. This approach forces us to recognize that in North America and Europe research ‘culture’, shortcomings may be addressed or remedied by looking to other research cultures globally. This would provide a more balanced argument for how both parties can mutually benefit one another - in line with the core intention of the piece. Thank you for the opportunity to review this article.
I have no competing interests with the authors.
-
-
Adetula, A., Forscher, P. S., Basnight-Brown, D., Azouaghe, S., Ouherrou, N., CHARYATE, A., … IJzerman, H. (2021, June 21). Synergy Between the Credibility Revolution and Human Development in Africa. https://doi.org/10.31730/osf.io/e57bq
-
January 22, 2025
-
January 22, 2025
-
January 22, 2025
-
Authors:
- Adeyemi Adetula (Busara Center for Behavioral Economics, Nairobi, Kenya) adeyemiadetula1@gmail.com
- Patrick Forscher (Department of Psychology, United States International University-Africa, Nairobi, Kenya) patrick.forscher@busara.global
- Dana Basnight-Brown (Department of Psychology, Mohammed V University, Rabat, Morocco) dana.basnightbrown@gmail.com
- Soufian Azouaghe (Université Chouaib Doukkali, Université Paul Valéry Montpellier 3) azouaghe.soufian@gmail.com
- Nihal Ouherrou (Université Ibn Tofail) n.ouherrou@gmail.com
- Abdelilah Charyate (University of Groningen, The Netherlands) abdelilah.charyate@uit.ac.ma
- Nina Hansen (Department of Pure and Applied Psychology, Adekunle Ajasin University, Akungba-Akoko, Ondo State, Nigeria) n.hansen@rug.nl
- Gabriel Adetula (LIP/PC2S, Université Grenoble Alpes, Saint-Martin-d’Heres, Auvergne-Rhône-Alpes, France & Institut Universitaire de France (IUF), France) g1b2gbo3detul4@gmail.com
- Hans IJzerman hans@absl.io
-
2
-
-
10.31730/osf.io/e57bq
-
Synergy Between the Credibility Revolution and Human Development in Africa
-
- Dec 2024
-
-
This article provides a brief history and review of peer review. It evaluates peer review models against the goals of scientific communication, expressing a preference for publish, review, curate (PRC) models. The review and history are useful. However, the article’s progression and arguments, along with what it seeks to contribute to the literature need refinement and clarification. The argument for PRC is under-developed due to a lack of clarity about what the article means by scientific communication. Clarity here might make the endorsement of PRC seem like less of a foregone conclusion.
As an important corollary, and in the interest of transparency, I declare that I am a founding managing editor of MetaROR, which is a PRC platform. It may be advisable for the author to make a similar declaration because I understand that they are affiliated with one of the universities involved in the founding of MetaROR.
Recommendations from the editor
I strongly endorse the main theme of most of the reviews, which is that the progression and underlying justifications for this article’s arguments needs a great deal of work. In my view, this article’s main contribution seems to be the evaluation of the three peer review models against the functions of scientific communication. I say ‘seems to be’ because the article is not very clear on that and I hope you will consider clarifying what your manuscript seeks to add to the existing work in this field.
In any case, if that assessment of the three models is your main contribution, that part is somewhat underdeveloped. Moreover, I never got the sense that there is clear agreement in the literature about what the tenets of scientific communication are. Note that scientific communication is a field in its own right. C
I also agree that paper is too strongly worded at times, with limitations and assumptions in the analysis minimised or not stated. For example, all of the typologies and categories drawn could easily be reorganised and there is a high degree of subjectivity in this entire exercise. Subjective choices should be highlighted and made salient for the reader.
Note that greater clarity, rigour, and humility may also help with any alleged or actual bias.
Some more minor points are:
-
I agree with Reviewer 3 that the ‘we’ perspective is distracting.
-
The paragraph starting with ‘Nevertheless’ on page 2 is very long.
-
There are many points where language could be shortened for readability, for example:
-
Page 3: ‘decision on publication’ could be ‘publication decision’.
-
Page 5: ‘efficiency of its utilization’ could be ‘its efficiency’.
-
Page 7: ‘It should be noted…’ could be ‘Note that…’.
-
-
Page 7: ‘It should be noted that..’ – this needs a reference.
-
I’m not sure that registered reports reflect a hypothetico-deductive approach (page 6). For instance, systematic reviews (even non-quantitative ones) are often published as registered reports and Cochrane has required this even before the move towards registered reports in quantitative psychology.
-
I agree that modular publishing sits uneasily as its own chapter.
-
Page 14: ‘The "Publish-Review-Curate" model is universal that we expect to be the future of scientific publishing. The transition will not happen today or tomorrow, but in the next 5-10 years, the number of projects such as eLife, F1000Research, Peer Community in, or MetaROR will rapidly increase’. This seems overly strong (an example of my larger critique and that of the reviewers).
-
-
In "Evolution of Peer Review in Scientific Communication", Kochetkov provides a point-of-view discussion of the current state of play of peer review for scientific literature, focussing on the major models in contemporary use and recent innovations in reform. In particular, they present a typology of three main forms of peer review: traditional pre-publication review; registered reports; and post-publication review, their preferred model. The main contribution it could make would be to help consolidate typologies and terminologies, to consolidate major lines of argument and to present some useful visualisations of these. On the other hand, the overall discussion is not strongly original in character.
The major strength of this article is that the discussion is well-informed by contemporary developments in peer-review reform. The typology presented is modest and, for that, readily comprehensible and intuitive. This is to some extent a weakness as well as a strength; a typology that is too straightforward may not be useful enough. As suggested at the end it might be worth considering how to complexify the typology at least at subordinate levels without sacrificing this strength. The diagrams of workflows are particularly clear.
The primary weakness of this article is that it presents itself as an 'analysis' from which they 'conclude' certain results such as their typology, when this appears clearly to be an opinion piece. In my view, this results in a false claim of objectivity which detracts from what would otherwise be an interesting and informative, albeit subjective, discussion, and thus fails to discuss the limitations of this approach. A secondary weakness is that the discussion is not well structured and there are some imprecisions of expression that have the potential to confuse, at least at first.
This primary weakness is manifested in several ways. The evidence and reasoning for claims made is patchy or absent. One instance of the former is the discussion of bias in peer review. There are a multitude of studies of such bias and indeed quite a few meta-analyses of these studies. A systematic search could have been done here but there is no attempt to discuss the totality of this literature. Instead, only a few specific studies are cited. Why are these ones chosen? We have no idea. To this extent I am not convinced that the references used here are the most appropriate. Instances of the latter are the claim that "The most well-known initiatives at the moment are ResearchEquals and Octopus" for which no evidence is provided, the claim that "we believe that journal-independent peer review is a special case of Model 3" for which no further argument is provided, and the claim that "the function of being the "supreme judge" in deciding what is "good" and "bad" science is taken on by peer review" for which neither is provided.
A particular example of this weakness, which is perhaps of marginal importance to the overall paper but of strong interest to this reviewer is the rather odd engagement with history within the paper. It is titled "Evolution of Peer Review" but is really focussed on the contemporary state-of-play. Section 2 starts with a short history of peer review in scientific publishing, but that seems intended only to establish what is described as the 'traditional' model of peer review. Given that that short history had just shown how peer review had been continually changing in character over centuries - and indeed Kochetkov goes on to describe further changes - it is a little difficult to work out what 'traditional' might mean here; what was 'traditional' in 2010 was not the same as what was 'traditional' in 1970. It is not clear how seriously this history is being taken. Kochetkov has earlier written that "as early as the beginning of the 21st century, it was argued that the system of peer review is 'broken'" but of course criticisms - including fundamental criticisms - of peer review are much older than this. Overall, this use of history seems designed to privilege the experience of a particular moment in time, that coincides with the start of the metascience reform movement.
Section 2 also demonstrates some of the second weakness described, a rather loose structure. Having moved from a discussion of the history of peer review to detail the first model, 'traditional' peer review, it then also goes on to describe the problems of this model. This part of the paper is one of the best - and best -evidenced. Given the importance of it to the main thrust of the discussion it should probably have been given more space as a Section all on its own.
Another example is Section 4 on Modular Publishing, in which Kochetkov notes "Strictly speaking, modular publishing is primarily an innovative approach for the publishing workflow in general rather than specifically for peer review." Kochetkov says "This is why we have placed this innovation in a separate category" but if it is not an innovation in peer review, the bigger question is 'Why was it included in this article at all?'.
One example of the imprecisions of language is as follows. The author also shifts between the terms 'scientific communication' and 'science communication' but, at least in many contexts familiar to this reviewer, these are not the same things, the former denoting science-internal dissemination of results through publication (which the author considers), conferences and the like (which the author specifically excludes) while the latter denotes the science-external public dissemination of scientific findings to non-technical audiences, which is entirely out of scope for this article.
A final note is that Section 3, while an interesting discussion, seems largely derivative from a typology of Waltman, with the addition of a consideration of whether a reform is 'radical' or 'incremental', based on how 'disruptive' the reform is. Given that this is inherently a subjective decision, I wonder if it might not have been more informative to consider 'disruptiveness' on a scale and plot it accordingly. This would allow for some range to be imagined for each reform as well; surely reforms might be more or less disruptive depending on how they are implemented. Given that each reform is considered against each model, it is somewhat surprising that this is not presented in a tabular or graphical form.
Beyond the specific suggestions in the preceding paragraphs, my suggestions to improve this article are as follows:
-
Reconceptualize this as an opinion piece. Where systematic evidence can be drawn upon to make points, use that, but don't be afraid to just present a discussion from what is clearly a well-informed author.
-
Reconsider the focus on history and 'evolution' if the point is about the current state of play and evaluation of reforms (much as I would always want to see more studies on the history and evolution of peer review).
-
Consider ways in which the typology might be expanded, even if at subordinate level.
I have no competing interests in the compilation of this review, although I do have specific interests as noted above.
-
-
The work ‘Evolution of Peer Review in Scientific Communication’ provides a concise and readable summary of the historical role of peer review in modern science. The paper categorises the peer review practices into three models: (1) traditional pre-publication peer review; (2) registered reports; (3) post-publication peer review. The author compares the three models and draws the conclusion that the “third model offers the best way to implement the main function of scientific communication”.
I would contest this conclusion. In my eyes the three models serve different aims - with more or less drawbacks. For example, although Model 3 is less chance to insert bias to the readers, it also weakens the filtering function of the review system. Let’s just think about the dangers of machine-generated articles, paper-mills, p-hacked research reports and so on. Although the editors do some pre-screening for the submissions, in a world with only Model 3 peer review the literature could easily get loaded with even more ‘garbage’ than in a model where additional peers help the screening.
Compared to registered reports other aspects can come to focus that Model 3 cannot cover. It’s the efficiency of researchers’ work. In the care of registered reports, Stage 1 review can still help researchers to modify or improve their research design or data collection method. Empirical work can be costly and time-consuming and post-publication review can only say that “you should have done it differently then it would make sense”.
Finally, the author puts openness as a strength of Model 3. In my eyes, openness is a separate question. All models can work very openly and transparently in the right circumstances. This dimension is not an inherent part of the models.
In conclusion, I would not make verdict over the models, instead emphasise the different functions they can play in scientific communication.
A minor comment: I found that a number of statements lack references in the Introduction. I would have found them useful for statements such as “There is a point of view that peer review is included in the implicit contract of the researcher.”
-
In this manuscript, the author provides a historical review of the place of peer review in the scientific ecosystem, including a discussion of the so-called current crisis and a presentation of three important peer review models. I believe this is a non-comprehensive yet useful overview. My main contention is that the structure of the paper could be improved. More specifically, the author could expand on the different goals of peer review and discuss these goals earlier in the paper. This would allow readers to better interpret the different issues plaguing peer review and helps put the costs and benefits of the three models into context. Other than that, I found some claims made in the paper a little too strong. Presenting some empirical evidence or downplaying these claims would improve the manuscript in my opinion. Below, you can find my comments:
-
In my view, the biggest issue with the current peer review system is the low quality of reviews, but the manuscript only mentions this fleetingly. The current system facilitates publication bias, confirmation bias, and is generally very inconsistent. I think this is partly due to reviewers’ lack of accountability in such a closed peer review system, but I would be curious to hear the author’s ideas about this, more elaborately than they provide them as part of issue 2.
-
I’m missing a section in the introduction on what the goals of peer review are or should be. You mention issues with peer review, and these are mostly fair, but their importance is only made salient if you link them to the goals of peer review. The author does mention some functions of peer review later in the paper, but I think it would be good to expand that discussion and move it to a place earlier in the manuscript.
-
Table 1 is intuitive but some background on how the author arrived at these categorizations would be welcome. When is something incremental and when is something radical? Why are some innovations included but not others (e.g., collaborative peer review, see https://content.prereview.org/how-collaborative-peer-review-can-transform-scientific-research/)?
-
“Training of reviewers through seminars and online courses is part of the strategies of many publishers. At the same time, we have not been able to find statistical data or research to assess the effectiveness of such training.” (p. 5) There is some literature on this, although not recent. See work by Sara Schroter for example, Schroter et al., 2004; Schroter et al., 2008)
-
“It should be noted that most initiatives aimed at improving the quality of peer review simultaneously increase the costs.” (p. 7) This claim needs some support. Please explicate why this typically is the case and how it should impact our evaluations of these initiatives.
-
I would rephrase “Idea of the study” in Figure 2 since the other models start with a tangible output (the manuscript). This is the same for registered reports where they submit a tangible report including hypotheses, study design, and analysis plan. In the same vein, I think study design in the rest of the figure might also not be the best phrasing. Maybe the author could use the terminology used by COS (Stage 1 manuscript, and Stage 2 manuscript, see Details & Workflow tab of https://www.cos.io/initiatives/registered-reports). Relatedly, “Author submits the first version of the manuscript” in the first box after the ‘Manuscript (report)’ node maybe a confusing phrase because I think many researchers see the first version of the manuscript as the stage 1 report sent out for stage 1 review.
-
One pathway that is not included in Figure 2 is that authors can decide to not conduct the study when improvements are required. Relatedly, in the publish-review-curate model, is revising the manuscripts based on the reviews not optional as well? Especially in the case of 3a, authors can hardly be forced to make changes even though the reviews are posted on the platform.
-
I think the author should discuss the importance of ‘open identities’ more. This factor is now not explicitly included in any of the models, while it has been found to be one of the main characteristics of peer review systems (Ross-Hellauer, 2017). More generally, I was wondering why the author chose these three models and not others. What were the inclusion criteria for inclusion in the manuscript? Some information on the underlying process would be welcome, especially when claims like “However, we believe that journal-independent peer review is a special case of Model 3 (“Publish-Review-Curate”).” are made without substantiation.
-
Maybe it helps to outline the goals of the paper a bit more clearly in the introduction. This helps the reader to know what to expect.
-
The Modular Publishing section is not inherently related to peer review models, as you mention in the first sentence of that paragraph. As such, I think it would be best to omit this section entirely to maintain the flow of the paper. Alternatively, you could shortly discuss it in the discussion section but a separate paragraph seems too much from my point of view.
-
Labeling model 3 as post-publication review might be confusing to some readers. I believe many researchers see post-publication review as researchers making comments on preprints, or submitting commentaries to journals. Those activities are substantially different from the publish-review-curate model so I think it is important to distinguish between these types.
-
I do not think the conclusions drawn below Table 3 logically follow from the earlier text. For example, why are “all functions of scientific communication implemented most quickly and transparently in Model 3”? It could be that the entire process takes longer in Model 3 (e.g. because reviewers need more time), so that Model 1 and Model 2 lead to outputs quicker. The same holds for the following claim: “The additional costs arising from the independent assessment of information based on open reviews are more than compensated by the emerging opportunities for scientific pluralism.” What is the empirical evidence for this? While I personally do think that Model 3 improves on Model 1, emphatic statements like this require empirical evidence. Maybe the author could provide some suggestions on how we can attain this evidence. Model 2 does have some empirical evidence underpinning its validity (see Scheel, Schijen, Lakens, 2021; Soderberg et al., 2021; Sarafoglou et al. 2022) but more meta-research inquiries into the effectiveness and cost-benefits ratio of registered reports would still be welcome in general.
-
What is the underlaying source for the claim that openness requires three conditions?
-
“If we do not change our approach, science will either stagnate or transition into other forms of communication.” (p. 2) I don’t think this claim is supported sufficiently strongly. While I agree there are important problems in peer review, I think would need to be a more in-depth and evidence-based analysis before claims like this can be made.
-
On some occasions, the author uses “we” while the study is single authored.
-
Figure 1: The top-left arrow from revision to (re-)submission is hidden
-
“The low level of peer review also contributes to the crisis of reproducibility in scientific research (Stoddart, 2016).” (p. 4) I assume the author means the low quality of peer review.
-
“Although this crisis is due to a multitude of factors, the peer review system bears a significant responsibility for it.” (p. 4) This is also a big claim that is not substantiated
-
“Software for automatic evaluation of scientific papers based on artificial intelligence (AI) has emerged relatively recently” (p. 5) The author could add RegCheck (https://regcheck.app/) here, even though it is still in development. This tool is especially salient in light of the finding that preregistration-paper checks are rarely done as part of reviews (see Syed, 2023)
-
There is a typo in last box of Figure 1 (“decicion” instead of “decision”). I also found typos in the second box of Figure 2, where “screns” should be “screens”, and the author decision box where “desicion” should be “decision”
-
Maybe it would be good to mention results blinded review in the first paragraph of 3.2. This is a form of peer review where the study is already carried out but reviewers are blinded to the results. See work by Locascio (2017), Grand et al. (2018), and Woznyj et al. (2018).
-
Is “Not considered for peer review” in figure 3b not the same as rejected? I feel that it is rejected in the sense that neither the manuscript not the reviews will be posted on the platform.
-
“In addition to the projects mentioned, there are other platforms, for example, PREreview12, which departs even more radically from the traditional review format due to the decentralized structure of work.” (p. 11) For completeness, I think it would be helpful to add some more information here, for example why exactly decentralization is a radical departure from the traditional model.
-
“However, anonymity is very conditional - there are still many “keys” left in the manuscript, by which one can determine, if not the identity of the author, then his country, research group, or affiliated organization.” (p.11) I would opt for the neutral “their” here instead of “his”, especially given that this is a paragraph about equity and inclusion.
-
“Thus, “closeness” is not a good way to address biases.” (p. 11) This might be a straw man argument because I don’t believe researchers have argued that it is a good method to combat biases. If they did, it would be good to cite them here. Alternatively, the sentence could be omitted entirely.
-
I would start the Modular Publishing section with the definition as that allows readers to interpret the other statements better.
-
It would be helpful if the Models were labeled (instead of using Model 1, Model 2, and Model 3) so that readers don’t have to think back what each model involved.
-
Table 2: “Decision making” for the editor’s role is quite broad, I recommend to specify and include what kind of decisions need to be made.
-
Table 2: “Aim of review” – I believe the aim of peer review differs also within these models (see the “schools of thought” the author mentions earlier), so maybe a statement on what the review entails would be a better way to phrase this.
-
Table 2: One could argue that the object of the review’ in Registered Reports is also the manuscript as a whole, just in different stages. As such, I would phrase this differently.
Good luck with any revision!
Olmo van den Akker (ovdakker@gmail.com)
References
Grand, J. A., Rogelberg, S. G., Banks, G. C., Landis, R. S., & Tonidandel, S. (2018). From outcome to process focus: Fostering a more robust psychological science through registered reports and results-blind reviewing. Perspectives on Psychological Science, 13(4), 448-456.
Ross-Hellauer, T. (2017). What is open peer review? A systematic review. F1000Research, 6.
Sarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E. J., & Aczel, B. (2022). A survey on how preregistration affects the research workflow: Better science but more work. Royal Society Open Science, 9(7), 211997.
Scheel, A. M., Schijen, M. R., & Lakens, D. (2021). An excess of positive results: Comparing the standard psychology literature with registered reports. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211007467.
Schroter, S., Black, N., Evans, S., Carpenter, J., Godlee, F., & Smith, R. (2004). Effects of training on quality of peer review: randomised controlled trial. Bmj, 328(7441), 673.
Schroter, S., Black, N., Evans, S., Godlee, F., Osorio, L., & Smith, R. (2008). What errors do peer reviewers detect, and does training improve their ability to detect them?. Journal of the Royal Society of Medicine, 101(10), 507-514.
Soderberg, C. K., Errington, T. M., Schiavone, S. R., Bottesini, J., Thorn, F. S., Vazire, S., ... & Nosek, B. A. (2021). Initial evidence of research quality of registered reports compared with the standard publishing model. Nature Human Behaviour, 5(8), 990-997.
Syed, M. (2023). Some data indicating that editors and reviewers do not check preregistrations during the review process. PsyArXiv Preprints.
Locascio, J. J. (2017). Results blind science publishing. Basic and applied social psychology, 39(5), 239-246.
Woznyj, H. M., Grenier, K., Ross, R., Banks, G. C., & Rogelberg, S. G. (2018). Results-blind review: A masked crusader for science. European Journal of Work and Organizational Psychology, 27(5), 561-576.
-
-
Overall thoughts: This is an interesting history piece regarding peer review and the development of review over time. Given the author’s conflict of interest and association with the Centre developing MetaROR, I think that this paper might be a better fit for an information page or introduction to the journal and rationale for the creation of MetaROR, rather than being billed as an independent article. Alternatively, more thorough information about advantages to pre-publication review or more downsides/challenges to post-publication review might make the article seem less affiliated. I appreciate seeing the history and current efforts to change peer review, though I am not comfortable broadly encouraging use of these new approaches based on this article alone.
Page 3: It’s hard to get a feel for the timeline given the dates that are described. We have peer review becoming standard after WWII (after 1945), definitively established by the second half of the century, an example of obligatory peer review starting in 1976, and in crisis by the end of the 20th century. I would consider adding examples that better support this timeline – did it become more common in specific journals before 1976? Was the crisis by the end of the 20th century something that happened over time or something that was already intrinsic to the institution? It doesn’t seem like enough time to get established and then enter crisis, but more details/examples could help make the timeline clear.
Consider discussing the benefits of the traditional model of peer review.
Table 1 – Most of these are self-explanatory to me as a reader, but not all. I don’t know what a registered report refers to, and it stands to reason that not all of these innovations are familiar to all readers. You do go through each of these sections, but that’s not clear when I initially look at the table. Consider having a more informative caption. Additionally, the left column is “Course of changes” here but “Directions” in text. I’d pick one and go with it for consistency.
3.2: Considering mentioning your conflict of interest here where MetaROR is mentioned.
With some of these methods, there’s the ability to also submit to a regular journal. Going to a regular journal presumably would instigate a whole new round of review, which may or may not contradict the previous round of post-publication review and would increase the length of time to publication by going through both types. If someone has a goal to publish in a journal, what benefit would they get by going through the post-publication review first, given this extra time?
There’s a section talking about institutional change (page 14). It mentions that openness requires three conditions – people taking responsibility for scientific communication, authors and reviewers, and infrastructure. I would consider adding some discussion of readers and evaluators. Readers have to be willing to accept these papers as reliable, trustworthy, and respectable to read and use the information in them. Evaluators such as tenure committees and potential employers would need to consider papers submitted through these approaches as evidence of scientific scholarship for the effort to be worthwhile for scientists.
Based on this overview, which seems somewhat skewed towards the merits of these methods (conflict of interest, limited perspective on downsides to new methods/upsides to old methods), I am not quite ready to accept this effort as equivalent of a regular journal and pre-publication peer review process. I look forward to learning more about the approach and seeing this review method in action and as it develops.
-
Kochetkov, D. (2024, March 21). Evolution of Peer Review in Scientific Communication. https://doi.org/10.31235/osf.io/b2ra3
-
Jul 26, 2024
-
Nov 20, 2024
-
Nov 20, 2024
-
Authors:
- Dmitry Kochetkov (Leiden University ) d.kochetkov@cwts.leidenuniv.nl
-
5
-
10.31235/osf.io/b2ra3
-
Evolution of Peer Review in Scientific Communication
-
-
osf.io osf.io
-
In this article the authors use a discrete choice experiment to study how health and medical researchers decide where to publish their research, showing the importance of impact factors in these decisions. The article has been reviewed by two reviewers. The reviewers consider the work to be robust, interesting, and clearly written. The reviewers have some suggestions for improvements. One suggestion is to emphasize more strongly that the study focuses on the health and medical sciences and to reflect on the extent to which the results may generalize to other fields. Another suggestion is to strengthen the embedding of the article in the literature. Reviewer 2 also suggests to extend the discussion of the sample selection and to address in more detail the question of why impact factors still persist.
Competing interest: Ludo Waltman is Editor-in-Chief of MetaROR working with Adrian Barnett, a co-author of the article and a member of the editorial team of MetaROR.
-
In "Researchers Are Willing to Trade Their Results for Journal Prestige: Results from a Discrete Choice Experiment", the authors investigate researchers’ publication preferences using a discrete choice experiment in a cross-sectional survey of international health and medical researchers. The study investigates publishing decisions in relation to negotiation of trade-offs amongst various factors like journal impact factor, review helpfulness, formatting requirements, and usefulness for promotion in their decisions on where to publish. The research is timely; as the authors point out, reform of research assessment is currently a very active topic. The design and methods of the study are suitable and robust. The use of focus groups and interviews in developing the attributes for study shows care in the design. The survey instrument itself is generally very well-designed, with important tests of survey fatigue, understanding (dominant choice task) and respondent choice consistency (repeat choice task) included. Respondent performance was good or excellent across all these checks. Analysis methods (pMMNL and latent class analysis) are well-suited to the task. Pre-registration and sharing of data and code show commitment to transparency. Limitations are generally well-described.
In the below, I give suggestions for clarification/improvement. Except for some clarifications on limitations and one narrower point (reporting of qualitative data analysis methods), my suggestions are only that – the preprint could otherwise stand, as is, as a very robust and interesting piece of scientific work.
-
Respondents come from a broad range of countries (63), with 47 of those countries represented by fewer than 10 respondents. Institutional cultures of evaluation can differ greatly across nations. And we can expect variability in exposure to the messages of DORA (seen, for example, in level of permeation of DORA as measured by signatories in each country, https://sfdora.org/signers/)..%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0 "https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Fsfdora.org%2Fsigners%2F).%3B!!NVzLfOphnbDXSw!HdeyeHHei6yWQHFjhN3deSSfp82ur9i9JNOLEVOYZN0BvyslUO2S8DlvjBbautmafJEvlUsxQZbT0JLQX7lO8EcOYtZsJkA%24&data=05%7C02%7Ca.l.brasil.varandas.pinto%40cwts.leidenuniv.nl%7C9f47a111adec49d04bb608dd0614ae94%7Cca2a7f76dbd74ec091086b3d524fb7c8%7C0%7C0%7C638673408085242099%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=by5mhPfSM0MFFG9LE2iiYjdtSs5IhvpuukqVv%2FLak2s%3D&reserved=0") In addition, some contexts may mandate or incentivise publication in some venues using measures including IF, but also requiring journals to be in certain databases like WoS or Scopus, or having preferred journal lists). I would suggest the authors should include in the Sampling section a rationale for taking this international approach, including any potentially confounding factors it may introduce, and then adding the latter also in the limitations.
-
Reporting of qualitative results: In the introduction and methods, the role of the focus groups and interviews seems to have been just to inform the design of the experiment. But then, results from that qualitative work then appear as direct quotes within the discussion to contextualise or explain results. In this sense though, the qualitative results are being used as new data. Given this, I feel that the methods section should include description of the methods and tools used for qualitative data analysis (currently it does not). But in addition, to my understanding (and this may be a question of disciplinary norms – I’m not a health/medicine researcher), generally new data should not be introduced in the discussion section of a research paper. Rather the discussion is meant to interpret, analyse, and provide context for the results that have already been presented. I personally hence feel that the paper would benefit from the qualitative results being reported separately within the results section.
-
Impact factors – Discussion section: While there is interesting new information on the relative trade-offs amongst other factors, the most emphasised finding, that impact factors still play a prominent role in publication venue decisions, is hardly surprising. More could perhaps be done to compare how the levels of importance reported here differ with previous results from other disciplines or over time (I know a like-for-like comparison is difficult but other studies have investigated these themes, e.g., https://doi.org/10.1177/01655515209585). In addition, beyond the question of whether impact factors are important, a more interesting question in my view is why they still persist. What are they used for and why are they still such important “driver[s] of researchers’ behaviour”? This was not the authors’ question, and they do provide some contextualisation by quoting their participants, but still I think they could do more to contextualise what is known from the literature on that to draw out the implications here. The attribute label in the methods for IF is “ranking”, but ranking according of what and for what? Not just average per-article citations in a journal over a given time frame. Rather, impact factors are used as a proxy indicators of less-tangible desirable qualities – certainly prestige (as the title of this article suggests), but also quality, trust (as reported by one quoted focus group member “I would never select a journal without an impact factor as I always publish in journals that I know and can trust that are not predatory”, p.6), journal visibility, importance to the field, or improved chances of downstream citations or uptake in news media/policy/industry etc. Picking apart the interactions of these various factors in researchers’ choices to make use of IFs (which is not in all cases bogus or unjustified) could add valuable context. I’d especially recommend engaging at least briefly with more work from Science and Technology Studies - especially Müller and de Rijcke’s excellent Thinking with Indicators study (doi: 10.1093/reseval/rvx023), but also those authors other work, as well as work from Ulrike Felt, Alex Rushforth (esp https://doi.org/10.1007/s11024-015-9274-5), Björn Hammerfelt and others.
-
Disciplinary coverage: (1) A lot of the STS work I talk about above emphasises epistemic diversity and the ways cultures of indicator use differ across disciplinary traditions. For this reason, I think it should be pointed out in the limitations that this is research in Health/Med only, with questions on generalisability to other fields. (2) Also, although the abstract and body of the article do make clear the disciplinary focus, the title does not. Hence, I believe the title should be slightly amended (e.g., “Health and Medical Researchers Are Willing to Trade …”)
-
-
This manuscript reports the results of an interesting discrete choice experiment designed to probe the values and interests that inform researchers’ decisions on where to publish their work.
Although I am not an expert in the design of discrete choice experiments, the methodology is well explained and the design of the study comes across as well considered, having been developed in a staged way to identify the most appropriate pairings of journal attributes to include.
The principal findings to my mind, well described in the abstract, include the observations that (1) researchers’ strongest preference was for journal impact factor and (2) that they were prepared to remove results from their papers if that would allow publication in a higher impact factor journal. The first of these is hardly surprising – and is consistent with a wide array of literature (and ongoing activism, e.g. through DORA, CoARA). The second is much more striking – and concerning for the research community (and its funders). This is the first time I have seen evidence for such a trade-off.
Overall, the manuscript is very clearly written. I have no major issues with the methods or results. However, I think but some minor revisions would enhance the clarity and utility of the paper.
First, although it is made clear in Table 1 that the researchers included in the study are all from the medical and clinical sciences, this is not apparent from the title or the abstract. I think both should be modified to reflect the nature of the sample. In my experience researchers in these fields are among those who feel most intensely the pressure to publish in high IF journals. The authors may want also to reflect in a revised manuscript how well their findings may transfer to other disciplines.
Second, in several places I felt the discussion of the results could be enriched by reference to papers in the recent literature that are missing from the bibliography. These include (1) Muller and De Rijcke’s 2017 paper on Thinking with Indicators, which discusses how the pressure of metrics impacts the conduct of research (https://doi.org/10.1093/reseval/rvx023); (2) Bjorn Brembs’ analysis of the reliability of research published in prestige science journals (https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2018.00376/full; and (3) McKiernan’s et al.’s examination of the use of the Journal Impact Factor in academic review, promotion, and tenure evaluations (https://pubmed.ncbi.nlm.nih.gov/31364991/).
Third, although the text and figures are nicely laid out, I would recommend using a smaller or different font for the figure legends to more easily distinguish them from body text.
-
Bohorquez, N. G., Weerasuriya, S., Brain, D., Senanayake, S., Kularatna, S., & Barnett, A. (2024, July 31). Researchers are willing to trade their results for journal prestige: results from a discrete choice experiment. https://doi.org/10.31219/osf.io/uwt3b
-
Aug 03, 2024
-
Nov 20, 2024
-
Authors:
- Natalia Gonzalez Bohorquez (Queensland University of Technology) natalia.gonzalezbohorquez@hdr.qut.edu.au
- Sucharitha Weerasuriya (Queensland University of Technology) sucharitha.weerasuriya@qut.edu.au
- David Brain (Queensland University of Technology) david.brain@qut.edu.au
- Sameera Senanayake (Duke-NUS Medical School) sameera.senanayake@duke-nus.edu.sg
- Sanjeewa Kularatna (Duke-NUS Medical School) sanjeewa.kularatna@duke-nus.edu.sg
- Adrian Barnett a.barnett@qut.edu.au
-
-
10.31219/osf.io/uwt3b
-
Researchers are willing to trade their results for journal prestige: results from a discrete choice experiment
-
-
arxiv.org arxiv.org
-
I started reading this paper with great interest, which flagged over time. As someone with extensive experience both publishing peer-reviewed research articles and working with publication data (Web of Science, Scopus, PubMed, PubMedCentral) I understand there are vagaries in the data because of how and when it was collected, and when certain policies and processes were implemented. For example, as an author starting in the late 1980s, we were instructed by the journal “guide to authors” to use only initials. My early papers were all only using initials. This changed in the mid-late 1990s. Another example, when working with NIH publications data, one knows dates like 1946 (how far back MedLine data go), 1996 (when PubMed was launched), and 2000 (when PubMedCentral was launched) and 2008 (when NIH Open Access policy enacted). There are also intermediate dates for changes in curation policy…. that underlie a transition from initials to full name in the biomedical literature.
I realize that the study covers all research disciplines, but still I am surprised that the authors of this paper don’t start with an examination of the policies underlying publications data, and only get to this at the end of a fairly torturous study.
As a reader, this reviewer felt pulled all over the place in this article and increasingly frustrated that this is a paper that explores the Dimensions database vagaries only and not really the core overall challenges of bibliometric data, irrespective of data source. Dimensions ingests data from multiple sources — so any analysis of its contents needs to examine those sources first.
A few specific comments:
-
The “history of science” portion of the paper focuses on English learned societies in the 17th century. There were many other learned societies across Europe, and also “papers” (books, treatises) from long before the 17th century in Middle-eastern and Asian countries (e.g, see history of mathematics, engineering, governance and policy, etc.). These other histories were not acknowledged by the authors. Research didn’t just spring full-formed out of Zeus’ head.
-
It is unclear throughout if the authors are referring to science, research, which disciplines are or are not included. The first chart on discipinary coverage is Fig 13 and goes back to 1940ish. Also, which languages are included in the analysis? For example, Figure 2 says “academic output” but from which academies? What countries? What languages? Disciplines? Also, in Figure 2, this reviewer would have like to see discussion about the variability in the noisiness of the data over time.
-
The inclusion of gender in the paper misses the mark for this reviewer. When dealing with initials, how can one identify gender? And when working in times/societies where women had to hide their identity to be published…. how can a name-based analysis of gender be applied? If this paper remains a study of the “initial era”, this reviewer recommends removing the gender analysis.
-
Reference needed for “It is just as important to see ourselves reflected in the outputs of the research careers…” (section B).
-
Reference needed for “This period marked the emergence of “Big Science” (Section B). How do we know this is Big Science? What is the relationship with the nature of science careers? Here it would be useful perhaps to mention that postdocs were virtually unheard of before Sputnik.
-
Fig 3. This would be more effective as a % total papers than absolute #.
-
Gradual Evolution of the Scholarly Record. This reviewer would like to see proportion of papers without authors. A lot of history of science research is available for this period, and a few references here would be welcome, as well as a by-country analysis (or acknowledgement that the data are largely from Europe and/or English-speaking countries).
-
Accelerated Changes in Recent Times. Again, this reviewer would like to see reference to scholarship on the history of science. One of the things happening in the post WW2 timeframe is the increase in government spending (in the US particularly) on R&D and academic research. So, is the academy changing or is it responding to “market forces”.
-
Reflective richness of data. “Evolution of the research community” is not described in the text, not is collaborative networks.
-
In the following paragraph, one could argue that evaluation was a driver of change, not a response to it. This reviewer would like to see references here.
-
II. Methodology. (i) 2nd sentence missing “to” “… and full form to refer to an author name…”. (ii) 2nd para the authors talk about epochs, but the data could be (are) discontinuous because of (a) curation policy, (b) curation technology, (c) data sources (e.g., Medline rolled out in the 1960s and back-populated to 1946). (iii) 4th para referes to Figs 3 and 4 showing a marked change between 1940 and 1950, but Fig 3 goes back only to 1960, and Fig 4 is so compressed it is hard to see anything in that time range. (iv) Para 7. “the active publishing community is a reasonable proxy for the global research population”. We need a reference here and more analysis. Is this Europe? English language? Which disciplines? All academia? Dimensions data? (v) Para 12 “In exploring the issue of gender…” see comments above. Gender is an important consideration but is out of scope, in this reviewer’s opinion, for this paper focused on use of initials vs. full name.
-
Listing 1. Is there a resolvable URL/DOI for this query?
-
Figs 9-11, 14, 15. This reviewer would like to see a more fulsome examination / discussion of data discontinuities. Particularly around ~1985-2000.
Discussion
-
The country-level discussion suggests the data (publications included) are only those that have been translated into English. Please clarify. Also, please add references in this section. There are a lot of bold statements, such as “A characteristic of these countries was the establishment of strong national academies.” Is this different from other places in the world? How? In the para before this statement, there is a phrase “picking out Slavonic stages” that is not clear to this reviewer.
-
The authors seem to get ahead of themselves talking about “formal” and “informal” in relation to whether initials or full names are used. And then discuss the “Power Distance” and end up arguing that it isn’t formal/informal … but rather publisher policies and curation practices driving the initial era and its end.
-
And then the authors come full circle on research articles being a technology, akin to a contract. Which is neat and useful. But all the intermediate data analysis is focused on the Dimensions data base and this reviewer would argue should be a part of the database documentation rather than a scholarly article.
-
This reviewer would prefer this paper be focused much more tightly on how publishing technology can and has driven the sociology of science. Dig more into the E. Journal Analysis and F. Technological analysis. Stick with what you have deep data for, and provide us readers with a practical and useful paper that maybe, just maybe, publishers will read and be incentivized to up their game with respect to adoption of “new” technologies like ORCID, DOIs for data, etc. Because these papers are not just expositions on a disciplinary discourse, they are also a window into how science (research) works and is done.
-
-
The presented preprint is a well-researched study on a relevant topic that could be of interest to a broad audience. The study's strengths include a well-structured and clearly presented methodology. The code and data used in the research are openly available on Figshare, in line with best practices for transparency. Furthermore, the findings are presented in a clear and organized manner, with visualization that aid understanding.
At the same time, I would like to draw your attention to a few points that could potentially improve the work.
-
I think it would be beneficial to expand the annotation to approximately 250 words.
-
The introduction starts with a very broad context, but the connection between this context and the object of the research is not immediately clear. There are few references in this section, making it difficult to determine whether the authors are citing others or their own findings.
-
The transition to the main topic of the study is not well-defined, and there is no description of the gap in the literature regarding the object of study. Additionally, "bibliometric archaeology" appears at the end of the introduction but is only mentioned again later in the discussion, which may cause confusion for the reader.
-
It would be helpful to clearly state the purpose and objectives of the study both in the Introduction and in the abstract as well.
-
Besides, it is important to elaborate on the contribution of this study in the introduction section.
-
The same applies to the background - a very broad context, but the connection with the object of the research is not entirely clear.
-
Page 4 - as far as I understand, these are conclusions from a literature review, while point 3 (Reflective Richness of Data) does not follow from the previous analysis.
-
The overall impression of the introduction and background is that it is an interesting text, but it is not well related to the objectives of the study. I would recommend shortening these sections by making the introduction and literature review more pragmatic and structured. At the same time, this text could be published as a standalone contribution.
-
As I mentioned above, the methodology refers to the strengths of the study. However, in this section, it would be helpful to introduce and justify the structure of presenting the results.
-
In the methodology section, the authors could also provide a footnote with a link to the code and dataset (currently, it is only given at the end).
-
With regard to the discussion, I would like to encourage the authors to place their results more clearly in the academic context. Ideally, references from the introduction and/or literature review would reappear in this section to help clarify the research contribution.
-
Although Discussion C is an interesting read, it seems more related to the introduction than the results. Again, the text itself is rather interesting, but it would benefit from a more thorough justification.
Remarks on the images:
-
At least the data source for the images should be specified in the background, because it is not obvious to the reader before describing the methodology.
-
The color distinction between China and Russia in Figure 8 is not very clear.
-
The gray lines in Figures 9-11 make the figures difficult to read. Additionally, the meaning of these lines is not clearly indicated in the legends of Figures 10 and 11. These issues should be addressed.
All comments and suggestions are intended to improve the article. Overall, I have a very positive impression of the work.
Sincere,
Dmitry Kochetkov
-
-
Overview
This manuscript provides an in-depth examination of the use of initials versus full names in academic publications over time, identifying what the authors term the "Initial Era" (1945-1980) as a period during which initials were predominantly used. The authors contextualize this within broader technological, cultural, and societal changes, leveraging a large dataset from the Dimensions database. This study contributes to the understanding of how bibliographic metadata reflects shifts in research culture.
Strengths
+ Novel concept and historical depth
The paper introduces a unique angle on the evolution of scholarly communication by focusing on the use of initials in author names. The concept of the "Initial Era" is original and well- defined, adding a historical dimension to the study of metadata that is often overlooked. The manuscript provides a compelling narrative that connects technological changes with shifts in academic culture.
+ Comprehensive dataset
The use of the Dimensions database, which includes over 144 million publications, lends significant weight to the findings. The authors effectively utilize this resource to provide both anecdotal and statistical analyses, giving the paper a broad scope. The differentiation between the anecdotal and statistical epochs helps clarify the limitations of the dataset and strengthens the authors' conclusions.
+ Cross-disciplinary relevance
The study's insights into the sociology of research, particularly the implications of name usage for gender and cultural representation, are highly relevant across multiple disciplines. The paper touches on issues of diversity, bias, and the visibility of researchers from different backgrounds, making it an important contribution to ongoing discussions about equity in academia.
+ Technological impact
The authors successfully connect the decline of the "Initial Era" to the rise of digital publishing technologies, such as Crossref, PubMed, and ORCID. This link between technological infrastructure and shifts in scholarly norms is a critical insight, showing how the adoption of new tools has real-world implications for academic practices.
Weaknesses
- Lack of clarity and readability
While the manuscript is rich in data and analysis, it can be dense and challenging to follow for readers not familiar with the technical details of bibliometric studies. The text occasionally delves into highly specific discussions that may be difficult for a broader audience to grasp while other concepts are introduced in cursory. Consider condensing the introduction section, removing unrelated historical accounts, and leading the audience to the key objectives of this research much earlier.
- Missing empirical case studies
The manuscript remains largely theoretical, relying heavily on data analysis without providing concrete case studies or empirical examples of how the "Initial Era" affected individual disciplines or researchers. A more detailed exploration of specific instances where the use of initials had significant consequences would make the findings more tangible. Incorporating case studies or anecdotes from the history of science that illustrate the real-world impacts of the trends identified in the data would enrich the paper. These examples could help ground the analysis in practical outcomes and demonstrate the relevance of the "Initial Era" to contemporary debates.
- Half-baked comparative analysis
Although the paper presents interesting data about different countries and disciplines, the comparative analysis between these groups could be further developed. For example, the reasons behind the differences in initial use between countries with different writing systems or academic cultures are not fully explored. A more in-depth comparative analysis that explains the cultural, linguistic, or institutional factors driving the observed differences in initial use would add nuance to the findings. This could involve a more detailed discussion of how non-Roman writing systems influence name formatting or how specific national academic policies shape author metadata.
- Limited discussion of alternative explanations
While the authors link the decline of the "Initial Era" to technological advancements, other potential explanations, such as changing editorial policies (“technological harmonisation”), shifts in academic prestige, or the influence of global collaboration, are not fully explored. The paper could benefit from a broader discussion of these factors. Expanding the discussion to include alternative explanations for the decline of initial use, and how these might interact with technological changes, would provide a more comprehensive view. Engaging with literature on academic publishing practices, editorial decisions, and global research trends could help contextualize the findings within a wider framework.
Conclusion
This manuscript offers a novel and insightful analysis of the evolution of name usage in academic publications, providing valuable contributions to the fields of bibliometrics, science studies, and research culture. With improvements in clarity, comparative analysis, and the incorporation of case studies, this paper has the potential to make a significant impact on our understanding of how metadata reflects broader societal and technological changes in academia. The authors are encouraged to refine their discussion and expand on the implications of their findings to make the manuscript more accessible and applicable to a wider audience.
-
Aug 14, 2024
-
Nov 20, 2024
-
Nov 20, 2024
-
Authors:
- Simon Porter (Digital Science) s.porter@digital-science.com
- Daniel Hook (Digital Science) d.hook@digital-science.com
-
2
-
10.48550/arXiv.2404.06500
-
The Rise and Fall of the Initial Era
-
-
-
This manuscript examines preprint review services and their role in the scholarly communications ecosystem. It seems quite thorough to me. In Table 1 they list many peer-review services that I was unaware of e.g. SciRate and Sinai Immunology Review Project.
To help elicit critical & confirmatory responses for this peer review report I am trialling Elsevier’s suggested “structured peer review” core questions, and treating this manuscript as a research article.
Introduction
-
Is the background and literature section up to date and appropriate for the topic?
Yes.
-
Are the primary (and secondary) objectives clearly stated at the end of the introduction?
No. Instead the authors have chosen to put the two research questions on page 6 in the methods section. I wonder if they ought to be moved into the introduction – the research questions are not methods in themselves. Might it be better to state the research questions first and then detail the methods one uses to address those questions afterwards? [as Elsevier’s structured template seems implicitly to prefer.
Methods
-
Are the study methods (including theory/applicability/modelling) reported in sufficient detail to allow for their replicability or reproducibility?
I note with approval that the version number of the software they used (ATLAS.ti) was given.
I note with approval that the underlying data is publicly archived under CC BY at figshare.
The Atlas.ti report data spreadsheet could do with some small improvement – the column headers are little cryptic e.g. “Nº ST “ and “ST” which I eventually deduced was Number of Schools of Thought and Schools of Thought (?)
Is there a rawer form of the data that could be deposited with which to evidence the work done? The Atlas.ti report spreadsheet seemed like it was downstream output data from Atlas.ti. What was the rawer input data entered into Atlas.ti? Can this be archived somewhere in case researchers want to reanalyse it using other tools and methods.
I note with disapproval that Atlas.ti is proprietary software which may hinder the reproducibility of this work. Nonetheless I acknowledge that Atlas.ti usage is somewhat ‘accepted’ in social sciences despite this issue.
I think the qualitative text analysis is a little vague and/or under-described: “Using ATLAS.ti Windows (version 23.0.8.0), we carried out a qualitative analysis of text from the relevant sites, assigning codes covering what they do and why they have chosen to do it that way.” That’s not enough detail. Perhaps an example or two could be given? Was inter-rater reliability performed when ‘assigning codes’ ? How do we know the ‘codes’ were assigned accurately?
-
Are statistical analyses, controls, sampling mechanism, and statistical reporting (e.g., P-values, CIs, effect sizes) appropriate and well described?
This is a descriptive study (and that’s fine) so there aren’t really any statistics on show here other than simple ‘counts’ (of Schools of Thought) in this manuscript. There are probably some statistical processes going on within the proprietary qualitative analysis of text done in ATLAS.ti but it is under described and so hard for me to evaluate.
Results
-
Is the results presentation, including the number of tables and figures, appropriate to best present the study findings?
Yes. However, I think a canonical URL to each service should be given. A URL is very useful for disambiguation, to confirm e.g. that the authors mean this Hypothesis (www.hypothes.is) and NOT this Hypothesis (www.hyp.io). I know exactly which Hypothesis is the one the authors are referring to but we cannot assume all readers are experts 😊
Optional suggestion: I wonder if the authors couldn’t present the table data in a slightly more visual and/or compact way? It’s not very visually appealing in its current state. Purely as an optional suggestion, to make the table more compact one could recode the answers given in one or more of the columns 2, 3 and 4 in the table e.g. "all disciplines = ⬤ , biomedical and life sciences = ▲, social sciences = ‡ , engineering and technology = † ". I note this would give more space in the table to print the URLs for each service that both reviewers have requested.
———————————————————————————————
| Service name | Developed by | Scientific disciplines | Types of outputs |
| Episciences | Other | ⬤ | blah blah blah. |
| Faculty Opinions | Individual researcher | ▲ | blah blah blah. |
| Red Team Market | Individual researcher | ‡ | blah blah blah. |
———————————————————————————————
The "Types of outputs" column might even lend themselves to mini-colour-pictograms (?) which could be more concise and more visually appealing? A table just of text, might be scientifically 'correct' but it is incredibly dull for readers, in my opinion.
-
Are additional sub-analyses or statistical measures needed (e.g., reporting of CIs, effect sizes, sensitivity analyses)?
No / Not applicable.
Discussion
-
Is the interpretation of results and study conclusions supported by the data and the study design?
Yes.
-
Have the authors clearly emphasized the limitations of their study/theory/methods/argument?
No. Perhaps a discussion of the linguistic/comprehension bias of the authors might be appropriate for this manuscript. What if there are ‘local’ or regional Chinese, Japanese, Indonesian or Arabic language preprint review services out there? Would this authorship team really be able to find them?
Additional points:
-
Perhaps the points made in this manuscript about financial sustainability (p24) are a little too pessimistic. I get it, there is merit to this argument, but there is also some significant investment going on there if you know where to look. Perhaps it might be worth citing some recent investments e.g. Gates -> PREreview (2024) https://content.prereview.org/prereview-welcomes-funding/ and Arcadia’s $4 million USD to COAR for the Notify Project which supports a range of preprint review communities including Peer Community In, Episciences, PREreview and Harvard Library. (source: https://coar-repositories.org/news-updates/coar-welcomes-significant-funding-for-the-notify-project/ )
-
Although I note they are mentioned, I think more needs to be written about the similarity and overlap between ‘overlay journals’ and preprint review services. Are these arguably not just two different terms for kinda the same thing? If you have Peer Community In which has it’s overlay component in the form of the Peer Community Journal, why not mention other overlay journals like Discrete Analysis and The Open Journal of Astrophysics. I think Peer Community In (& it’s PCJ) is the go-to example of the thin-ness of the line the separates (or doesn’t!) overlay journals and preprint review services. Some more exposition on this would be useful.
-
-
Thank you very much for the opportunity to review the preprint titled “Preprint review services: Disrupting the scholarly communication landscape?” (https://doi.org/10.31235/osf.io/8c6xm) The authors review services that facilitate peer review of preprints, primarily in the STEM (science, technology, engineering, and math) disciplines. They examine how these services operate and their role within the scholarly publishing ecosystem. Additionally, the authors discuss the potential benefits of these preprint peer review services, placing them in the context of tensions in the broader peer review reform movement. The discussions are organized according to four “schools of thought” in peer review reform, as outlined by Waltman et al. (2023), which provides a useful framework for analyzing the services. In terms of methodology, I believe the authors were thorough in their search for preprint review services, especially given that a systematic search might be impractical.
As I see it, the adoption of preprints and reforming peer review are key components of the move towards improving scholarly communication and open research. This article is a useful step along that journey, taking stock of current progress, with a discussion that illuminates possible paths forward. It is also well-structured and easy for me to follow. I believe it is a valuable contribution to the metaresearch literature.
On a high level, I believe the authors have made a reasonable case that preprint review services might make peer review more transparent and rewarding for all involved. Looking forward, I would like to see metaresearch which gathers further evidence that these benefits are truly being realised.
In this review, I will present some general points which merit further discussion or clarification to aid an uninitiated reader. Additionally, I raise one issue regarding how the authors framed the article and categorised preprint review services and the disciplines they serve. In my view, this problem does not fundamentally undermine the robust search, analyses, and discussion in this paper, but it risks putting off some researchers and constrains how broadly one should derive conclusions.
General comments
Some metaresearchers may be aware of preprints, but not all readers will be familiar with them. I suggest briefly defining what they are, how they work, and which types of research have benefited from preprints, similar to how “preprint review service” is clearly defined in the introduction.
Regarding Waltman et al.’s (2023) “Equity & Inclusion” school of thought, does it specifically aim for “balanced” representation by different groups as stated in this article? There is an important difference between “balanced” versus “equitable” representation, and I would like to see it addressed in this text.
Another analysis I would like to see is whether any of the 23 services reviewed present any evidence that their approach has improved research quality. For instance, the discussion on peer review efficiency and incentives states that there is currently “no hard evidence” that journals want to utilise reviews by Rapid Reviews: COVID-19, and that “not all journals are receptive” to partnerships. Are journals skeptical of whether preprint review services could improve research quality? Or might another dynamic be at work?
The authors cite Nguyen et al. (2015) and Okuzaki et al. (2019), stating that peer review is often “overloaded”. I would like to see a clearer explanation by what “overloaded” means in this context so that a reader does not have to read the two cited papers.
To the best of my understanding, one of the major sticking points in peer review reform is whether to anonymise reviewers and/or authors. Consequently, I appreciate the comprehensive discussion about this issue by the authors.
However, I am only partially convinced by the statement that double anonymity is “essentially incompatible” with preprint review. For example, there may be, as yet not fully explored, ways to publish anonymous preprints with (a) a notice that it has been submitted to, or is undergoing, peer review; and (b) that the authors will be revealed once peer review has been performed (e.g. at least one review has been published). This would avoid the issue of publishing only after review is concluded as is the case for Hypothesis and Peer Community In.
Additionally, the authors describe 13 services which aim to “balance transparency and protect reviewers’ interests”. This is a laudable goal, but I am concerned that framing this as a “balance” implies a binary choice, and that to have more of one, we must lose an equal amount of the other. Thinking only in terms of “balance” prevents creative, win-win solutions. Could a case be made for non-anonymity to be complemented by a reputation system for authors and reviewers? For example, major misconduct (e.g. retribution against a critical review) would be recorded in that system and dissuade bad actors. Something similar can already be seen in the reviewer evaluation system of CrowdPeer, which could plausibly be extended or modified to highlight misconduct.
I also note that misconduct and abusive behaviour already occur even in fully or partially anonymised peer review, and they are not limited to the review or preprints. While I am not aware of existing literature on this topic, academics’ fears seem reasonable. For example, there is at least anecdotal testimonies that a reviewer would deliberately reject a paper to retard the progress of a rival research group, while taking the ideas of that paper and beating their competitors to winning a grant. Or, a junior researcher might refrain from giving a negative review out of fear that the senior researcher whose work they are reviewing might retaliate. These fears, real or not, seem to play a part in the debates about if and how peer review should (or should not) be anonymised. I would like to see an exploration of whether de-anonimisation will improve or worsen this behaviour and in what contexts. And if such studies exist, it would be good to discuss them in this paper.
I found it interesting that almost all preprint review services claim to be complementary to, and not compete with, traditional journal-based peer review. The methodology described in this article cannot definitely explain what is going on, but I suspect there may be a connection between this aversion to compete with traditional journals, and (a) the skepticism of journals towards partnering with preprint review services and (b) the dearth of publisher-run options. I hypothesise that there is a power dynamic at play, where traditional publishers have a vested interest in maintaining the power they hold over scholarly communication, and that preprint review services stress their complementarity (instead of competitiveness) as a survival mechanism. This may be an avenue for further metaresearch.
To understand preprints from which fields of research are actually present on the services categorised under “all disciplines,” I used the Random Integer Set Generator by the Random.org true random number service (https://www.random.org/integer-sets/) to select five services for closer examination: Hypothesis, Peeriodicals, PubPeer, Qeios, and Researchers One. Of those, I observed that Hypothesis is an open source web annotation service that allows commenting on and discussion of any web page on the Internet regardless of whether it is research or preprints. Hypothesis has a sub-project named TRiP (Transparent Review in Preprints), which is their preprint review service in collaboration with Cold Spring Harbor Laboratory. It is unclear to me why the authors listed Hypothesis as the service name in Table 1 (and elsewhere) instead of TRiP (or other similar sub-projects). In addition, Hypothesis seems to be framed as a generic web annotation service that is used by some as a preprint review tool. This seems fundamentally different from others who are explicitly set up as preprint review services. This difference seems noteworthy to me.
To aid readers, I also suggest including hyperlinks to the 23 services reviewed in this paper. My comments on disciplinary representation in these services are elaborated further below.
One minor point of curiosity is that several services use an “automated tool” to select reviewers. It would be helpful to describe in this paper exactly what those tools are and how they work, or report situations where services do not explain it.
Lastly, what did the authors mean by “software heritage” in section 6? Are they referring to the organisation named Software Heritage (https://www.softwareheritage.org/) or something else? It is not clear to me how preprint reviews would be deposited in this context.
Respecting disciplinary and epistemic diversity
In the abstract and elsewhere in the article, the authors acknowledge that preprints are gaining momentum “in some fields” as a way to share “scientific” findings. After reading this article, I agree that preprint review services may disrupt publishing for research communities where preprints are in the process of being adopted or already normalised. However, I am less convinced that such disruption is occurring, or could occur, for scholarly publishing more generally.
I am particularly concerned about the casual conflation of “research” and “scientific research” in this article. Right from the start, it mentions how preprints allow sharing “new scientific findings” in the abstract, stating they “make scientific work available rapidly.” It also notes that preprints enable “scientific work to be accessed in a timely way not only by scientists, but also…” This framing implies that all “scholarly communication,” as mentioned in the title, is synonymous with “scientific communication.” Such language excludes researchers who do not typically identify their work as “scientific” research. Another example of this conflation appears in the caption for Figure 1, which outlines potential benefits of preprint review services. Here, “users” are defined as “scientists, policymakers, journalists, and citizens in general.” But what about researchers and scholars who do not see themselves as “scientists”?
Similarly, the authors describe the 23 preprint review services using six categories, one of which is “scientific discipline”. One of those disciplines is called “humanities” in the text, and Table 1 lists it as a discipline for Science Open Reviewed. Do the authors consider “humanities” to be a “scientific” discipline? If so, I think that needs to be justified with very strong evidence.
Additionally, Waltman et al.’s four schools of thought for peer review reform works well with the 23 services analysed. However, at least three out of the four are explicitly described as improving “scientific” research.
Related to the above are how the five “scientific disciplines” are described as the “usual organisation” of the scholarly communication landscape. On what basis should they be considered “usual”? In this formulation, research in literature, history, music, philosophy, and many other subjects would all be lumped together into the “humanities”, which sit at the same hierarchical level as “biomedical and life sciences”, arguably a much more specific discipline. My point is not to argue for a specific organisation of research disciplines, but to highlight a key epistemic assumption underlying the whole paper that comes across as very STEM-centric (science, technology, engineering, and math).
How might this part of the methodology affect the categories presented in Table 1? “Biomedical and life sciences” appear to be overrepresented compared to other “disciplines”. I’d like to see a discussion that examines this pattern, and considers why preprint review services (or maybe even preprints more generally) appear to cover mostly the biomedical or physical sciences.
In addition, there are 12 services described as serving “all disciplines”. I believe this paper can be improved by at least a qualitative assessment of the diversity of disciplines actually represented on those services. Because it is reported that many of these service stress improving the “reproducibility” of research, I suspect most of them serve disciplines which rely on experimental science.
I randomly selected five services for closer examination, as mentioned above. Of those, only Qeios has demonstrated an attempt to at least split “arts and humanities” into subfields. The others either don’t have such categories altogether, or have a clear focus on a few disciplines (e.g. life sciences for Hypothesis/TRiP). In all cases I studied, there is a heavy focus on STEM subjects, especially biology or medical research. However, they are all categorised by the authors as serving “all disciplines”.
If preprint review services originate from, or mostly serve, a narrow range of STEM disciplines (especially experiment-based ones), it would be worth examining why that is the case, and whether preprints and reviews of them could (or could not) serve other disciplines and epistemologies.
It is postulated that preprint review services might “disrupt the scholarly communication landscape in a more radical way”. Considering the problematic language I observed, what about fields of research where peer-reviewed journal publications are not the primary form of communication? Would preprint review services disrupt their scholarly communications?
To be clear, my concern is not just the conflation of language in a linguistic sense but rather inequitable epistemic power. I worry that this conflation would (a) exclude, minoritise, and alienate researchers of diverse disciplines from engaging with metaresearch; and (b) blind us from a clear pattern in these 23 services, that is their strong focus on the life sciences and medical research and a discussion of why that might be the case. Critically, what message are we sending to, for example, a researcher of 18th century French poetry with the language and framing of this paper? I believe the way “disciplines” are currently presented here poses a real risk of devaluing and minoritising certain subject areas and ways of knowing. In its current form, I believe that while this paper is a very valuable contribution, one should not derive from it any conclusions which apply to scholarly publishing as a whole.
The authors have demonstrated inclusive language elsewhere. For example, they have consciously avoided “peer” when discussing preprint review services, clearly contrasting them to “journal-based peer review”. Therefore, I respectfully suggest that similar sensitivity be adopted to avoid treating “scientific research” and “research” as the same thing. A discussion, or reference to existing works, on the disciplinary skew of preprints (and reviews of them) would also add to the intellectual rigour of this already excellent piece.
Overall, I believe this paper is a valuable reflection on the state of preprints and services which review them. Addressing the points I raised, especially the use of more inclusive language with regards to disciplinary diversity, would further elevate its usefulness in the metaresearch discourse. Thank you again for the chance to review.
Signed:
Dr Pen-Yuan Hsing (ORCID ID: 0000-0002-5394-879X)
University of Bristol, United Kingdom
Data availability
I have checked the associated dataset, but still suggest including hyperlinks to the 23 services analysed in the main text of this paper.
Competing interests
No competing interests are declared by me as reviewer.
-
Henriques, S. O., Rzayeva, N., Pinfield, S., & Waltman, L. (2023, October 13). Preprint review services: Disrupting the scholarly communication landscape?. https://doi.org/10.31235/osf.io/8c6xm
-
Aug 11, 2024
-
Nov 20, 2024
-
Nov 20, 2024
-
Authors:
- Susana Henriques (Research on Research Institute (RoRI) Centre for Science and Technology Studies (CWTS), Leiden University, Leiden, the Netherlands Scientific Research Department, Azerbaijan University of Architecture and Construction, Baku, Azerbaijan) s.oliveira@cwts.leidenuniv.nl
- Narmin Rzayeva (Research on Research Institute (RoRI) Information School, University of Sheffield, Sheffield, UK) n.rzayeva@cwts.leidenuniv.nl
- Stephen Pinfield (Research on Research Institute (RoRI) Centre for Science and Technology Studies (CWTS), Leiden University, Leiden, the Netherlands) s.pinfield@sheffield.ac.uk
- Ludo Waltman waltmanlr@cwts.leidenuniv.nl
-
7
-
10.31235/osf.io/8c6xm
-
Preprint review services: Disrupting the scholarly communication landscape
-
- Nov 2024
-
arxiv.org arxiv.org
-
Editorial Assessment
This article presents a large-scale data-driven analysis of the use of initials versus full first names in the author lists of scientific publications, focusing on changes over time in the use of initials. The article has been reviewed by three reviewers. The originality of the research and the large-scale data analysis are considered strengths of the article. A weakness is the clarity, readability, and focus of certain parts of the article, in particular the introduction and background sections. In addition, the reviewers point out that the discussion section can be improved and deepened. The reviewers also suggest opportunities for strengthening or extending the article. This includes adding case studies, extending the comparative analysis, and providing more in-depth analyses of changes over time in policies, technologies, and data sources. Finally, while reviewer 2 is critical about the gender analysis, reviewer 3 considers this analysis to be a strength of the article.
-
-
-
Editorial Assessment
The authors present a descriptive analysis of preprint review services. The analysis focuses on the services’ relative characteristics and differences in preprint review management. The authors conclude that such services have the potential to improve the traditional peer review process. Two metaresearchers reviewed the article. They note that the background section and literature review are current and appropriate, the methods used to search for preprint servers are generally sound and sufficiently detailed to allow for reproduction, and the discussion related to anonymizing articles and reviews during the review process is useful. The reviewers also offered suggestions for improvement. They point to terminology that could be clarified. They suggest adding URLs for each of the 23 services included in the study. Other suggestions include explaining why overlay journals were excluded, clarifying the limitation related to including only English-language platforms, archiving rawer input data to improve reproducibility, adding details related to the qualitative text analysis, discussing any existing empirical evidence about misconduct as it relates to different models of peer review, and improving field inclusiveness by avoiding conflation of “research” and “scientific research.”
The reviewers and I agree that the article is a valuable contribution to the metaresearch literature related to peer review processes.
Handling Editor: Kathryn Zeiler
Competing interest: I am co-Editor-in-Chief of MetaROR working with Ludo Waltman, a co-author of the article and co-Editor-in-Chief of MetaROR
-