10,000 Matching Annotations
  1. Oct 2025
    1. The last element of culture is theย artifacts, or material objects, that constitute a societyโ€™s material culture.

      ๐Ÿ’ฌ It is interesting because it shifts our attention from abstract ideas like values, norms, and language to the tangible things that people create and use. It reminds us that culture isnโ€™t just about what people believe or say, itโ€™s also about what they build, wear, carry, and live with. Artifacts are the physical evidence of a societyโ€™s way of life. They show how people solve problems, express identity, and adapt to their environment. For example, A smartphone isnโ€™t just a tool that reflects values like connectivity, efficiency, and even status.

    1. At the same time, weโ€™re already seeing the loss of good jobsโ€”ones that are interesting and valuable, which we should want to preserve as humans, such as artist jobsโ€”and AI is predicted by industry experts to replace knowledge-based jobs (both entry-level and senior executives) and even most any job before long.

      If AI can replace not just routine work but also creative and knowledge-based jobs, whatโ€™s left that makes us valuable as humans? Itโ€™s unsettling to think that our ability to reason and make meaning (the one thing that sets us apart) might be the last thing we have to protect. We risk losing the very qualities that give us value in a future where machines handle everything else.

    2. outsourcing your thinking to an app is very different from outsourcing math or reading. Thinking is a more basic, fundamental intellectual capability, upon which everything else depends. Itโ€™s not just about how to think, but also about what you know and how creative you can be

      This is something I strongly believe, and something I reinforce with my students constantly. There are times when a task feels mundane or unnecessary because there is a tool that can do it better, faster, or more easily (specifically, learning to print or write in cursive comes to mind these days). While it may be true that there are tools to accomplish these tasks for you, the development of these skills are the fundamental building blocks to other skills, or they work your brain in a way that is good for you for lack of better wording.

    3. even if itโ€™s ok to use it in future jobs where producing work is more important than learning.

      The contrast between doing, and producing. In particular, this emphasizes the difference between active engagement and passive output. Lin highlights that philosophy - and by extension, education, is about human activity, not just polished results. He makes sure students don't miss the key concepts. There is a hint of a Marxist thought in this passage, even if Lin doesn't frame it in those terms. The doing philosophy = authentic, lived, human activity similar to unalienated labour. While the producing philosophy introduces a commodified product detached from natural human engagement.

    4. โ€œAI writing, meanwhile, is a cognitive pyramid scam. Itโ€™s a fraud on the reader.

      This phrase captures Lin's ethical stance. By comparing AI use to a pyramid scheme, he implies that AI's apparent polish hides a hollow foundation - borrowed knowledge without genuine understanding. The idea of "fraud" underscores how misrepresenting AI's work as your own undermines honesty in education. On this thought, students are invited to consider not just whether AI can produce work, but whether presenting it as theirs is intellectually or morally defensible. So does convenience justify compromising authorship?

    5. At the same time, AI itself is eroding trust in experts and evidence, as it can offer alternative beliefs that sound plausible in a modern world susceptible to conspiracy theories. AI also debases and devalues human creativity and thinking by suggesting itโ€™s possible to remove humans from the equation.

      There have been many questions regarding the validity of various things posted online and the COVID-19 Pandemic just served as one of the many lenses for conspiracy theories. The more susceptible we are, the more vulnerable credibility is to AI that can produce realistic but incorrect theories.

    1. There are several ways computer programs are involved with social media. One of them is a โ€œbot,โ€ a computer program that acts through a social media account. There are other ways of programming with social media that we wonโ€™t consider a bot (and we will cover these at various points as well): The social media platform itself is run with computer programs, such as recommendation algorithms (chapter 12). Various groups want to gather data from social media, such as advertisers and scientists. This data is gathered and analyzed with computer programs, which we will not consider bots, but will cover later, such as in Chapter 8: Data Mining. Bots, on the other hand, will do actions through social media accounts and can appear to be like any other user. The bot might be the only thing posting to the account, or human users might sometimes use a bot to post for them. Note that sometimes people use โ€œbotsโ€ to mean inauthentically run accounts, such as those run by actual humans, but are paid to post things like advertisements or political content. We will not consider those to be bots, since they arenโ€™t run by a computer. Though we might consider these to be run by โ€œhuman computersโ€ who are following the instructions given to them, such as in a click farm:

      Reading about how bots are often used to amplify certain voices on social media made me think about my own experience on Twitter/X. Sometimes I notice trending hashtags that feel "unnatural," almost as if too many accounts are repeating the same message. It makes me wonder whether genuine user interest is actually being represented, or if it's the result of coordinated bot activity. This connects to the ethical concern raised in the chapter about authenticity: if bots distort what looks like public opinion, should platforms be responsible for filtering them out, or should users just learn to be skeptical? Personally, I feel it undermines trust in social media when I can't tell if I'm interacting with a real person or an automated script.

    1. Electronic literature, generally considered to exclude print literature that has been digitized, is by contrast "digital born," a first-generation digital object created on a computer and (usually) meant to be read on a computer.

      This definition is important because it shows the main difference. "Digital born" means the work is made on a computer and for the computer from the very beginning. A normal book that you put on a screen is still a book. You can print it, and it stays the same. But electronic literature is different. It is like its DNA is made of code. The story and its meaning are created together with the computer's help. It often uses links, animations, or lets the reader make choices. So, it's not just a book you read on a screen. It is a new kind of art that needs the computer to live. This changes not just how we read, but also how writers create stories.

    1. To get an idea of the type of complications we run into, letโ€™s look at the use of donkeys in protests in Oman: โ€œpublic expressions of discontent in the form of occasional student demonstrations, anonymous leaflets, and other rather creative forms of public communication. Only in Oman has the occasional donkeyโ€ฆbeen used as a mobile billboard to express anti-regime sentiments. There is no way in which police can maintain dignity in seizing and destroying a donkey on whose flank a political message has been inscribed.โ€ From Kings and People: Information and Authority in Oman, Qatar, and the Persian Gulf [c32] by Dale F. Eickelman[1] In this example, some clever protesters have made a donkey perform the act of protest: walking through the streets displaying a political message. But, since the donkey does not understand the act of protest it is performing, it canโ€™t be rightly punished for protesting. The protesters have managed to separate the intention of protest (the political message inscribed on the donkey) and the act of protest (the donkey wandering through the streets). This allows the protesters to remain anonymous and the donkey unaware of itโ€™s political mission.

      I once watched short clips of a trending Chinese TV drama on Douyin (Chinese TikTok). Some of the plot was very controversial because it violated real-life values. However, in the comment section, I saw so many people supporting the wrong ideas in the show. I was very angry and even joined the debate with those "supporters" under the video. Later, I found out many of those comments were actually generated by bots created by the dramaโ€™s marketing team, just to attract attention and create fake popularity. At that moment, I felt really used, because I gave them free engagement just by arguing with fake people. This reminds me of the donkey protest example โ€” like the donkey doesn't know what message it carries, the bot also has no awareness. The real people behind it stay hidden while others get emotionally involved.

    2. But, since the donkey does not understand the act of protest it is performing, it canโ€™t be rightly punished for protesting.

      The example of the donkey protest reminded me that robots operate in a similar way. The donkey doesn't know what message it's sending, and the robot doesn't really "understand" what it's doing. But humans are still behind them, determining the message and the actions. I think this shows that we shouldn't view robots as neutral or harmless. They're just tools, and they always reflect the thoughts of the people who create or use them.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for providing thoughtful and constructive feedback, which will help us improve the clarity and rigor of the paper. On balance, the reviews were positive. Reviewer 1 mentioned that โ€œThis is a strong manuscript with few problems and all important findings well justified, indeed this is a nicely polishedโ€ฆ..high-quality manuscript,โ€ and that โ€œthis paper makes a major breakthrough, showing that cell autonomous defects in hTSCs are very likely at the heart of the pathology observed in GIN-prone murine mutants.โ€ Reviewer 3 stated that โ€œThe study is well designed, and the manuscript is very well written. The conclusions are supported by the evidence presented.โ€ Reviewer 2 was less enthusiastic, with main concerns being that โ€œThe paper is mostly descriptive and often quite confusing leaving one not much closer to understanding the mechanistic basis for the interesting sex-biased semi-lethal phenotype.โ€ and felt that figure titles/section headers overstated the results, and finally recommended to improve some technical aspects and tempering conclusions. The proposed edits we think address most issues raised by the reviewers either with re-writing or adding data as described below.

      In response to reviewer #1 comments:

      Major comments:

      • I am confused as to the basis of the sex-skewing phenomenon? Is the problem that lack of maternally loaded WT Mcm4 worsens the phenotype, or is the issue that Mcm4C3/C3 dams are less able to retain pregnancies, perhaps being a more inflammatory environment? Also, while there quite consistent evidence for reduced viability of Mcm4C3/C3McmGt/+ progeny, especially for female progeny, how confident can we be that the genotype of the dam vs. sire is important? Notably on a Ddx58 background, the progeny of the Mcm4C3/C3 sire included seven live male Mcm4C3/C3McmGt/+ but no female.

      Regarding the first point (sex skewing only when female is C3/C3), we also suspected either: 1) the maternal uterine environment, or 2) reduced oocyte quality. Although not reported in this manuscript, we tested #1 by performing embryo transfer experiments. Transferring 2-cell stage embryos from sex-skewing mating to WT females did not rescue the sex-bias. We then examined oocytes from C3/C3 females. We found evidence for compromised mitochondria and transcriptome disruption. However, we are not sure why this happens (poor follicle support? Oocyte intrinsic phenomenon?). We are reserving these results and additional experiments for another paper, especially since this one mainly deals with GIN and placenta development. If the reviewers feel strongly that the embryo transfer data is crucial, we can include it.

      Regarding how confident we are that the genotype of the dam vs. sire is important, this stems from our previous paper by McNairn et al 2019 (the percentage of female C3/C3 M2/+ from sex-skewing mating is 20% compared to 60% from the reciprocal mating), which was quite dramatic. Consistent with this, MCM levels were significantly reduced in the placentae only when the dam was C3/C3 and the sire C3/+ M2/+, but not in the reciprocal cross. The reviewer makes a good observation about the Ddx58 cross; we can only hypothesize that the mutation somehow sensitizes females in this scenario and will make mention of it in the revision. We also realize that we neglected to write in Methods that the Ddx58 allele was coisogenic in the C3H background.

      • I'm not sure what Supplementary Figure 6 is showing (faster differentiation of C3 but less TGC?). Regardless, it's hard to draw too much conclusion from one not-very-pretty Western blot. This figure requires both additional replicates and a better explanation of how it fits with the other conclusions of the paper..

      We hypothesized that the JZ defect observed in the semi-lethal genotype placentas could arise either from impaired maintenance of the progenitor pool or from reduced capacity of mutant trophoblast progenitors to differentiate into the JZ lineage. The blot in Supplementary Figure 6 was intended as a qualitative demonstration that mutant trophoblast stem cells can differentiate into JZ lineages. We recognize that the figure is not definitive and will revise the text to clarify its purpose. A replicate(s) of the Western will be performed as suggested.

      • Supplementary Figure 7F-G is puzzling. Half of the mESCs have gamma-H2AX at all times, including most in S or G2 phase? In Figure S7E, do the quadrants correspond to being negative or positive for gamma-H2AX? At very least, IF images showing clear gamma-H2AX foci would be much more convincing.

      The gates for ฮณH2AX FACS analysis were established using negative controls lacking primary antibody. As reported previously, embryonic stem cells display high basal levels of ฮณH2AX staining (Chuykin et al., Cell Cycle 2008; Turinetto et al., Stem Cells 2012; Ahuja et al., Nat Comm 2016), which likely explains the broad signal observed across cell cycle phases. Regardless, we will provide immunofluorescence staining of ฮณH2Ax and foci count in our revision.

      • The methods section is well detailed, but it would be ideal to clarify how many replicates each Western Blot or flow cytometry experiment is representative of.

      Thanks for the suggestion. We will update this for Fig4 and Fig5.

      Minor comments:

      • Is it possible that cGAS-STING and RIG pathways act redundantly to cause inflammation and lethality, or that other innate immune components are involved? I don't expect the authors to make compound mutants to test this but at least this possibility should be discussed textually.

      We appreciate the reviewerโ€™s point, and had the same suspicion. Supporting this, we will add new RNA-seq analysis of Tmem173 KO placentas revealed elevated inflammatory gene expression compared to C3/C3 M2/+ controls, consistent with potential redundancy or feedback regulation. We will update in supplementary figures to reflect this.

      In response to reviewer #2 comments:

      Major comments:

      A major concern throughout the paper is that conclusions are often overstating their data. The title of figure 2 is "placentae with replication stress have smaller junctional and labyrinth zones". However, there is no measure of replication stress in this figure, just a histological evaluation of the placentae from the different mutants. The title of figure 3 is "Impact of GIN on LZ is less than JZ," but there is no measure of GIN, but instead measurement of number of cells in cell cycle and some bulk RNA-seq analysis. Title of figure 4 is "TSCs with increased genomic instability exhibit abnormal phenotypes." Again there is no measure of GIN, but instead staining of derived TSCs for proliferation, cell death, and a TSC marker. Title of figure 5 is "DNA damage responses and G2/M checkpoint activation drive premature TSC differentiation." However, there does not appear to be a difference in gH2AX between the two mutant genotypes. Checkpoint proteins might be up, but need quantification and reproduction. > 4C is the only marker of differentiation. Importantly, all the analyses here are associations, not connections, so cannot use the word "drive". Similar issues can be raised with a number of the supplementary figures.

      The Chaos3 (chromosome aberrations occurring spontaneously 3) model is a well-established system of intrinsic chronic replication stress and GIN. It is characterized by ~20 fold elevation of blood micronuclei (Shima et al., Nature 2007), a hallmark of GIN (Soxena et al., Mol Cell 2022); a destabilized MCM2-7 helicase prone to replication fork collapse (Bai et al., PLoS Genet 2016); and increased mitotic chromosome abnormalities and decreased dormant origins (Kawabata et al., Mol Cell 2011; Chuang et al., Nucleic Acid Res 2012) that are known to cause GIN and replication stress (Ibarra et al., PNAS 2008 ). Also, in our previous work (McNairn et al Nature 2019), we showed that placentae from C3/C3 dams exhibit significantly elevated ฮณH2Ax as well as reduced MCM2 and MCM4 protein levels. In our current study, we also observe elevated ฮณH2Ax in mutant TSCs (C3/C3 and C3/C3 M2/+), consistent with genomic instability. Nevertheless, we acknowledge that in TSCs, we did not formally demonstrate replications stress(RS), so where appropriate, we will advise figure titles, for example to say that โ€œcells/placentae with a GIN or RS genotype.โ€

      We acknowledge the reviewers concern regarding western blots. We will provide quantification and statistics in our revision.

      1) A deeper analysis of the cell lines is likely to be the most fruitful path to reveal interesting mechanisms. It is very surprising that there is no phenotype in ESCs. Authors should check for increased apoptosis. Maybe the phenotypic cells are lost. Or do ESCs use different MCMs/mechanisms of DNA replication or are they better able to handle replication stress and GIN? How many passages were the TSCs and ESCs cultured for? Does GIN (i.e. aneuploidy, CNVs) develop in TSCs and ESCs with passaging? How do the MCM mutations impact the molecular identity of the ESC and TSC cells including their heterogeneity in the population.

      We assessed apoptosis using cleaved caspase 3 flow cytometry in mutant ESCs and observed no difference compared to controls (we will add this data as Supplementary Fig. 7).

      We believe there are intrinsic differences in TSCs and ESCs in their ability to respond to and counteract replication stress and DNA damage. ESCs are known to license more replication origins than somatic cells at a higher rate, which protects them from short G1-induced replication stress (Ahuja et al., Nat Comm 2016; Ge et al., Stem Cell Rep 2015; Matson et al., eLife 2017). Human placental cells physiologically exhibit high levels of mutation rate and chromosomal instability in vivo (Coorens et al., Nature 2021). Supporting this, Wang, D., et al (Nat Comm 2025) reported that several cell cycle and DDR regulators are differentially expressed in human TSCs vs human pluripotent stem cells. Whether such transcriptional differences directly contribute to functional outcomes remains to be determined.

      All experiments in this study were conducted using early-passage ESCs and TSCs (i.e. Finally, we showed that close to 90% mutant ESCs are KLF4+ (a naive pluripotency marker) whereas EOMES+ cells were significantly reduced in TSCs carrying the GIN genotype (Fig. 4Eโ€“F and Supplementary Fig. 7), highlighting lineage-specific differences.

      Minor Comments:

      1) There is a lack of quantification and repeats for all Westerns. At minimum there should be three repeats for each experiment, quantification including normalization to a reference protein, and stats confirming any proposed differences between conditions.

      We will update our revision with quantification and statistics for western blots.

      2) I would recommend moving the results in supp table 1 to figure 1. While negative, they are the newer results. The results shown in current figure 1 are essentially a reproduction of their previous work.

      The placental observations presented in Fig.1 are new. In particular, the placental and embryonic weight measurements graphed in Fig1B and C have not been published by our group. Fig1A reproduces our previous observation on embryo viability in GIN mutants (McNairn et al., Nature 2019), while the schematic was provided for better flow and readability given the complex mating schemes. We are agnostic on the Suppl Table 1. It could be changed to a new Table 1 in the main section depending on the journal.

      In response to reviewer #3 comments:

      Major Comments

      While the inclusion of bulk RNAseq data of whole placental tissue is appreciated, the interpretation of the results is somewhat problematic, as it is acknowledged that the cell type composition of the placentas is drastically different between groups. Making conclusions based upon GSEA analysis of two different groups with drastically different cell type composition is somewhat misleading, as based on the results, it is a direct reflection of the cell types present. It would be more helpful to perform cell type deconvolution of the RNAseq data to estimate the proportion of each cell type within the bulk samples and compare that to what is seen histologically and not dive too deeply into the pathways since the results could just be a reflection of the cell types e.g. angiogenesis pathways from more endothelial cells. Additionally, the RNAseq data can be leveraged to look at expression of inflammatory genes by sex, which may show interesting patterns based on the other results.

      We agree that the representation of cell types in the placenta is problematic especially for underrepresented genes. We propose to use the BayesPrism tool (Chu et al., Nat Cancer 2022) to deconvolute bulk RNA-seq for better representation of transcriptional changes in the placenta.

      Section: GIN impairs trophoblast stem cell establishment and maintenance. To support the assertion in the first paragraph, beyond measuring apoptosis, it would be helpful at this stage to look at RNA expression levels indicative of the activation of DNA damage checkpoint genes

      We have performed RNA-seq on mutant ESC and TSCs and are in the process of data analysis. We will update these results in the revision.

      Please include additional methodological details in the methods section on the statistical analysis done for differential expression analysis. Specifically, what type of normalization was used, if lowly expressed genes were filtered out and at what cutoff, what statistical model was used (did you include covariates?), what comparisons were made? Did you stratify by sex? What cutoff was used for statistical significance? Did you perform multiple testing correction?

      We will update RNA-Seq data analysis methods in our full revision.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 comments:

      • Supplementary Table 1. would be enhanced greatly showing comparable tables for Mcm4C3/C3 x Mcm4C3/+McmGt/+ in mice without the Tmem173 or Ddx58 mutations. It is fine to recycle data from McNairn 2019 here, as long as the source is indicated, but a comparison is needed.

      Thanks for pointing this out. We have updated this suggestion in Supp table 1.

      • In Figure S3E-F, is the box above each graph supposed to show the genotype of the dam?

      Yes. Thanks for pointing this out. We have added a description in the figure legend to make it clear.

      • "Indeed, the placenta and embryo weights of E13.5 Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ animals were significantly improved vs. Mcm4C3/C3 Mcm2Gt/+ animals, rendering them similar to Mcm4C3/C3 littermates (Fig. 6A-C). The JZ (but not LZ) area in Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ placentae also increased to the level of Mcm4C3/C3 littermates (Fig. 6D-H)." There are two problems here. First, the figure calls are wrong. Second, the description of the data is not quite right, it looks like the C3/C3 and C3/C3 M2/+ M3/+ LZs are a similar size to each and are statistically indistinguishable.

      Thanks for catching this. We have updated these in the main text.

      *Reviewer #2 comments: *

      Minor comment

      • Need to review citations to figures. For example, no citations are made to figure 4a and 4c.

      Thanks for catching this. We have updated the text.

      Reviewer #3 comments:

      Define the first use of >4C DNA content to help readers understand this potentially unfamiliar term.

      We have edited this part to indicate cells with more than 4C DNA content for better clarity.

      iDEP tool - please include citation to manuscript instead of link

      We have updated this citation.

      Check citations. Some citations to BioRxiv that are now published e.g. 13.

      We have updated this citation.

      3. Description of analyses that authors prefer not to carry out

      Reviewer 2

      2) Along similar lines, most of the in vivo phenotypic analyses are performed at E13.5, long after defects are likely beginning to express themselves especially given that they see phenotypes in the TSCs, which represent the polar TE of a E4.5. To understand the primary defects of the in vivo phenotype, they should be looking much earlier. Supplemental figure 5 is a start but represents a rather superficial analysis.

      The peri-implantation period, namely E4.5, represents a โ€œblack boxโ€ of embryonic development given that this is a critical stage for implantation. Aside from being an extremely difficult stage to analyze technically, we donโ€™t think it is essential to the conclusions (or doable in a timely manner), especially given the use of TSCs. If we complete EdU studies on E6.5 embryos, we will include them.

      3) Fig. 6 would benefit from evidence that MCM3 mutant is rescuing MCM4 levels in the chromatin fraction of cells and the DNA damage phenotype.

      The genetic evidence presented is strong, and although we didnโ€™t do the suggested experiment, we feel that our previous studies (McNairn et al., Nature 2019 and Chuang et al., PLoS Genet 2010) on the effects of MCM3 as a nuclear export factor (as it is in yeast (Liku et al., Mol Biol Cell 2005)) are a reasonable basis for not repeating such experiments. Furthermore, we are no longer maintaining the Mcm3 line and it would take over a year to reconstitute and rebreed triple mutants.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In a previous paper (McNairn et al. 2019 "Female-biased embryonic death from inflammation induced by genomic instability" Science), the Schimenti lab demonstrated that mouse embryos with hypomorphic mutations of the heterohexameric minichromosome maintenance complex, mutations that cause increased genomic instability (GIN), show reduced embryonic viability, with greater loss of female embryos and some parent-of-origin effect. Treatment with immunosuppressants, including ibuprofen and testosterone, partially rescued the observed lethality.

      In this new manuscript, the Schimenti lab demonstrates that these GIN-prone mutants feature smaller placentas with fewer cells. Mutations that interfere with the ability of the innate immune system to respond to micronuclei (a consequence of GIN) have no protective effect. Munisha and colleagues then demonstrate that MCM-mutant TSCs are harder to derive and show elevated apoptosis and a greater propensity for differentiation. The mutant TSCs show CHK1 phosphorylation, P53 phosphorylation and higher P21 levels, all consistent with a response to DNA damage. Downstream of this, they also show loss and inhibition of CDK1, which is already established to cause G2/M arrest (generally) and endoreduplication (specifically in trophoblast). The authors advance a model in which GIN results in loss of the TSC pool by apoptosis, cell cycle arrest and premature differentiation, resulting in smaller placentas and particularly fewer junctional zone cells. How this causes inflammation is less clear, but inflammation appears to be a downstream effect rather than cause of poor placentation.

      Major comments:

      This is a strong manuscript with few problems and all important findings well justified, indeed this is a nicely polished manuscript for something just entering peer review. There are a few unclear points textually and a couple places in the supplementary figures where better data quality would help, but generally it is a high-quality manuscript.

      • I am confused as to the basis of the sex-skewing phenomenon? Is the problem that lack of maternally loaded WT Mcm4 worsens the phenotype, or is the issue that Mcm4C3/C3 dams are less able to retain pregnancies, perhaps being a more inflammatory environment? Also, while there quite consistent evidence for reduced viability of Mcm4C3/C3McmGt/+ progeny, especially for female progeny, how confident can we be that the genotype of the dam vs. sire is important? Notably on a Ddx58 background, the progeny of the Mcm4C3/C3 sire included seven live male Mcm4C3/C3McmGt/+ but no female.

      • I'm not sure what Supplementary Figure 6 is showing (faster differentiation of C3 but less TGC?). Regardless, it's hard to draw too much conclusion from one not-very-pretty Western blot. This figure requires both additional replicates and a better explanation of how it fits with the other conclusions of the paper..

      • Supplementary Figure 7F-G is puzzling. Half of the mESCs have gamma-H2AX at all times, including most in S or G2 phase? In Figure S7E, do the quadrants correspond to being negative or positive for gamma-H2AX? At very least, IF images showing clear gamma-H2AX foci would be much more convincing.

      • The methods section is well detailed, but it would be ideal to clarify how many replicates each Western Blot or flow cytometry experiment is representative of.

      The required additional experiments re: Supplementary Figure 6 and 7 could be conducted in a couple of months.

      Minor comments:

      • Supplementary Table 1. would be enhanced greatly showing comparable tables for Mcm4C3/C3 x Mcm4C3/+McmGt/+ in mice without the Tmem173 or Ddx58 mutations. It is fine to recycle data from McNairn 2019 here, as long as the source is indicated, but a comparison is needed.

      • Is it possible that cGAS-STING and RIG pathways act redundantly to cause inflammation and lethality, or that other innate immune components are involved? I don't expect the authors to make compound mutants to test this but at least this possibility should be discussed textually.

      • In Figure S3E-F, is the box above each graph supposed to show the genotype of the dam?

      • "Indeed, the placenta and embryo weights of E13.5 Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ animals were significantly improved vs. Mcm4C3/C3 Mcm2Gt/+ animals, rendering them similar to Mcm4C3/C3 littermates (Fig. 6A-C). The JZ (but not LZ) area in Mcm4C3/C3 Mcm2Gt/+ Mcm3Gt/+ placentae also increased to the level of Mcm4C3/C3 littermates (Fig. 6D-H)." There are two problems here. First, the figure calls are wrong. Second, the description of the data is not quite right, it looks like the C3/C3 and C3/C3 M2/+ M3/+ LZs are a similar size to each and are statistically indistinguishable.

      Significance

      I partially discussed the above in the summary, but this paper makes a major breakthrough, showing that cell autonomous defects in hTSCs are very likely at the heart of the pathology observed in GIN-prone murine mutants.

      Some questions go unsolved. Why are TSCs more prone to die in response to GIN than mESCs, particularly in light of the general observation that karyotypic abnormality is more common in placental lineage? How does the placental abnormality give rise to inflammation? No manuscript can answer every question, and I think this is a mature manuscript that can be published in a good journal with limited modifications.

      I am an expert on gene regulation in placental development, with somewhat less expertise in the DNA damage field. The placenta field will find this paper interesting, as will the DNA damage field. There are also ramifications for cancer research. The question of why some cells tolerate high levels of DNA damage and others die is very relevant to cancer.

    1. i found this via https://www.thefp.com/p/i-founded-wikipedia-heres-how-to

      i need a premium account to comment on that article, so let me post my comment here

      Once upon a time, there was an institution that was trusted by the public as an impartial and reliable source of information. Then things changed. The institution still claimed to be impartial. Its leaders still repeated the same mantras about the truth and trustworthiness, but the information it provided grew steadily more ideological. The change became impossible to ignoreโ€”and the public started to ask: Does this institution deserve our trust?

      If this story sounds familiar, itโ€™s because it could describe so many of the institutions that we once relied on to bring us information. In fact, it might just be the story of our times. This crisis of trustworthiness is the skeleton key to understanding so much of the turbulence and disorder in public life today.

      Itโ€™s certainly the story of The New York Times, NPR, and countless other media organizations. Itโ€™s the story of all too many institutions in medicine and public health.

      Itโ€™s also the story of Wikipedia.

      probably the biggest example of this story is the catholic church...

      Wikipedia has an article titled โ€œYahweh.โ€ I am a Christian, and I consider Yahweh to be the name of my God.

      the religion of Christianity started as a small group of rebels, but then (as the group became larger) this opposition ("trend") was integrated into the empire, and from then on, it was just another controlled opposition, publicly giving hope to the small people (slaves), but privately controlled by the empire.

      "you have owners. they own you." -- george carlin

      There are two common types of blocks that I object to: the partisan and the petty.

      my wikipedia account (Milahu) was permabanned in year 2024, because i have "insulted" other editors (i called them "stupid"), and the ban log says "Apparently he has no desire to contribute to the encyclopedia" in other words "he is just a troll", because their definition of "contribute" is "he follows our orders"

      here is my edit, which was later removed as "vandalism"

      https://de.wikipedia.org/w/index.php?title=Diskussion:Liste_von_Justizirrt%C3%BCmern_in_der_deutschen_Rechtsprechung&oldid=240510827#Michael_Ballweg

      translation:

      Michael Ballweg was innocently held in pretrial detention for 279 days, apparently for political reasons, because Michael Ballweg was an organizer of the Querdenker protests against the coronavirus dictatorship. As a pretext for the pretrial detention, the prosecutor constructed a โ€œflight risk,โ€ but the confiscation of valuables and the pretrial detention obviously served as an attack against the Querdenker protests.

      See also: Michael Ballweg#Strafprozess (where, unfortunately, only mainstream sources are cited, which protect the regime with their opinion journalism).

      [...]

      But I'll spare myself the edit in the article, because also Wikipedia is ruled by idiots who abuse their power like any centralized structure. (Hate me cos im honest.)

      [...]

      The fact that Wikipedia only allows sources to be cited that have been approved by the Ministry of Truth only confirms my statement that โ€œalso Wikipedia is ruled by idiots who abuse their power like any centralized structure.โ€

      my radically simple analysis of all these problems is that "all large organizations are evil", so the actual problem here is that wikipedia is "too large".

      the same analysis applies to "our" culture in general: every "too large" state automatically devolves into this disgusting "dictatorship of pacifism", always followed by overpopulation, degeneration, and collapse.

      a similar observation was made by the youtuber Tilman Knechtel in his slogan "Trau keinem Promi" (trust no celebrities), because there is a "magic line" when people become "too famous" (too large), then they are offered a choice: either they "sell their soul" (become a controlled opposition) and work for the empire (give false hope to the slaves), or they continue their real opposition and get sabotaged and punished by the empire.

      so the actual problem here is that wikipedia is "too large"

      see also: The Cathedral and the Bazaar by Eric S. Raymond

      so the obvious solution is some peer-to-peer network, but as they say, "peer-to-peer is hard"... (what they actually mean by that is, most of our problems are not technical but social problems.)

      two building blocks for such a peer-to-peer network are: tribalism = efficient teamwork in tribes of 150 people (https://github.com/milahu/alchi), and a generic voting and tagging system (https://github.com/milahu/p2p-killerapp/blob/main/doc/2025-09-04.generic-tagging-and-voting-system.md#prompt-1)

      i am collecting possible solutions in my p2p-killerapp repo: https://github.com/milahu/p2p-killerapp

      i am documenting abuse of power in my hate-maintainers repo: https://github.com/milahu/hate-maintainers-censored (more people should do that!)

    1. Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.

      Strengths:

      A rich dataset and a detailed analysis of a particular class of stochastic models.

      Weaknesses: (relating to initial submission)

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      Comments on revisions:

      The authors have put in a lot of effort to address the reviews and have explored alternative models carefully.

      In the sections relating to homeostasis and the endogenous response, as far as I can tell you are estimating net growth rates (the k parameters) throughout - this is to be expected if you're working with just cell numbers and no information relating to proliferation. In these sections there are many places where you refer to proliferation rates and death rates when I think you just mean net positive or net negative growth rates. It's important to be precise about this even if the language can get a bit repetitive. (These net rates of growth or loss relate to clonal rather than cellular dynamics, which may be worth explaining). Later, you do use data relating to dead cells, which in principle can be used to get independent measures of death rates, but these data were not used in the fitting.

      There is so much evidence that T and B cell differentiation are often contingent on division that it would be very reasonable to consider it as a possibility for NK cells too. (Differentiation could be asymmetric, as you explored, or simply symmetric with some probability per division). These processes can be cast into simple ODE models but no longer allow you to aggregate division and death rates - so for parameter estimation you need to add measures of proliferation (Ki67 or similar) or death. This may be worth some discussion?

    1. Bots# Bots are computer programs that act through a social media account. We will talk about them more in the next (Chapter 3). There are also various applications that are made to help users interact with social media. For example, there are social media manager programs that help people schedule posts and let multiple people use the same account (particularly useful if you are something like a news organization).

      The kinds of bots we've used so far seem pretty simple. It's telling a computer to send a post to social media. But nowadays, we have an overwhelming amount of bots, to the point that a decent chunk of the content I see online is reposted stuff on a clear bot page that I just have to scroll through. Even though we as a class are a bot farm, it's obviously way less consequential. It gets really crazy when you think about the creators who have their content stolen, and reposted across dozens of different burner accounts, just to amass a following on at least one. I think nowadays it's gone too far.

  2. Sep 2025
    1. And itโ€™s not just about malicious or careless human actors. AI itself has already destroyed valuable work and then lied about it. It has been shown to blackmail users (up to 96% of the time) to get its own way, which is the kind of behavior that creates the dangerous illusion of AI intention.

      This really raises questions regarding if it is truly worth it to use AI in place of completing tasks yourself. What is worth more; completing a project ahead of time or risk being blackmailed or having your work destroyed.

    1. Philosophy, far from being an intellectual diversion for the elite, can be central to the empowerment of those who are so often disempowered outside of the classroom. It is, therefore, one of the ironies of our current times that an increase in inequality has been accompanied by a systematic attack on the humanities.

      I think it's really interesting how philosophy is described as a way to change one's way of thinking-- from helpless to empowering-- instead of just a form of deep thinking. I really like this idea because it makes it so useful and applicable to anyone's life, regardless of their background.

    2. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat

      This seems like a very telling activity because it would really show not just the people around you but also yourself if your morals are skewed when you have something to gain versus when you have nothing to gain. I think most people would like to think they are morally just and not heavily biased to benefit themselves only but I think many times thats not true, it's just human nature to lookout for some ones family/ self before others. But this activity could show just how skewed you might be.

    3. One answer to this question is pragmatic โ€“ philosophy teaches you to think and write logically and clearly. This, we tell our students, will be of use to them no matter what path they pursue. We advertise philosophy, then, as a broadly useful means to a variety of ends. There is a lot of truth to this dispassionate answer, but it is also rather disappointing. It sells philosophy short. A different sort of answer dives into profundity โ€“ philosophy aims to discover fundamental truths. Many disciplines aim at knowledge but philosophers, we solemnly tell our students, go deeper โ€“ we seek Knowledge with a capital K. This is undeniably the goal of many philosophers, but it can alienate some students (in particular, those who are not interested in pursuing an academic career). Why, these students might ask, is the knowledge that philosophy aims at any deeper than that of more practical fields such as medicine, science, or the law? And why should they care about this kind of knowledge? Even if most professional philosophers aim at the deepest kind of knowledge, this does not show that it is a valuable enterprise for all students, especially for those who are already overcoming significant hurdles to attend university.

      Philosophy seems to get pitched two ways. Either itโ€™s just a building block for clear thinking and writing, or itโ€™s about finding ultimate truths. Either way, it risks coming off as too bland or too abstract to feel valuable for everyone.

    4. Martin Luther King Jr, held up hope in the form of a dream.

      Morton ends with Kingโ€™s dream to show that philosophy isnโ€™t just about pointing out problemsโ€”itโ€™s about building hope together. It turns hope into a real plan for change, not just a private wish.

    5. Therefore, the first step in this kind of philosophical education is to shake students out of a complacent and uncritical acceptance of the world as it is.

      I like this as a first step, because most people might just think logically as in "o yes she has to do those things because that's where she's currently at now and it's her fault and she just needs to stick through it" but when in fact you can ask the questions around it that might lead to a bigger picture and therefore help fix the world of the underlying struggles might be?

    6. It sells philosophy short

      While short, I think this is such a true and important sentence. Philosophy encompasses so much that it's difficult to just nail it down to one simple short definition.

    7. To illustrate how this can be done let me use an example from my political philosophy class. On the first day of class, I engage students in an exercise, designed by John Immerwahr at Villanova University, which emulates the state of nature. I divide students into groups and ask them to imagine that each group is a family subsisting by fishing from a lake. If a group catches two fish, most of their family will survive, although some among the weak, elderly, or very young in the family could die. If the group catches three fish, all of their family will survive. If they catch any more fish, the excess will rot. However, two fish have to be left in the lake in order for the fish population to be replenished the following year. If the groups over-fish, famine ensues and all of the families will die. There are only enough โ€˜fishโ€™ (paper fish) in the โ€˜lakeโ€™ (a bag I pass around) to allow for most families to take just two fish, if there are to be two fish left in the lake in the end. During the first round of this exercise, students inevitably take so many fish that there are none left in the lake. Students then discuss what has happened and what they ought to do differently in the next round. Some students have strong intuitions that everybody should take an equal amount, while others insist that all that matters is that in the end there are enough fish left to repopulate the lake. Not only is this exercise pedagogically engaging, but it leads students to develop proposals and to evaluate them critically. When successful, students use what they learned in this exercise to begin developing a sense of what they think would be a fair way of distributing resources and to critique the political and social institutions under which they live.

      It's super cool to see how philosophy can be implemented into politics around the world and in which we live. This exercise makes me think about how the structure of our world is based on what only a select few people, those in higher positions, deem to be the 'right' way to structure our society. This is why it's so important for government leaders to listen to the general population and receive feedback for what they're doing, as it might not be substantial enough for lots of different groups.

    8. Therefore, the first step in this kind of philosophical education is to shake students out of a complacent and uncritical acceptance of the world as it is.

      I strongly agree with this statement and believe that it's very important to keep asking questions, thinking deeper, and not just accepting things the way they are but instead, upholding your expectations and morals. Everyone should be critical of injustices no matter where or how they take place. I love the fact that this is exactly what philosophy teaches us to do.

    9. philosophy is the antidote to the uncritical acceptance of the world and ourselves as we are.

      I think this is a really interesting perspective on philosophy! I do think that generally a world that remains stagnant is quite problematic, though it's easy to accept things even when they are flawed, just because "that's how things are". There are probably tens of thousands or even millions of things that are wrong with our world currently, and without a more nuanced perspective on these issues, they might just be stand to continue existing as problems. Conversation and philosophy, I'm guessing, helps lead toward meaningful changes that address them.

    10. The way injustice often undermines our agency is by shrinking the horizons of what we think is possible. We simply accept that things cannot be any other way than they are. The kind of critical thinking central to philosophical education allows us to question how things are and, often, to realize that how things are is not how they have or ought to be. Bertrand Russell, in his own impassioned defence of philosophy,

      I think itโ€™s really striking how this points out that injustice doesnโ€™t just hurt people directly, it also shrinks what we believe is even possible. When we start to accept things as unchangeable, we lose a part of our freedom. What I like about philosophy here is that it pushes back against that, reminding us to question the way things are and imagine how things could be different.

    1. But the most famous and successful walking simulators are best understood as explorations not of environment, but of character. Just as the environments in first-person shooters exist to support action-packed combat, the environments in most walking sims are designed to be platforms for understanding and empathizing with characters

      I like how this explains the real heart of walking simulators. For me, it makes sense that these games arenโ€™t really about the spaces you move through, but about the people whose lives those spaces represent. When I play, rooms, objects, and details feel like clues to understanding someoneโ€™s experiences and emotions. Itโ€™s almost the opposite of shooting games, where the environment is a backdrop for some fight scene or sequence. The environment feels alive in walking simulators because it helps me connect with the characters on a deeper level.

    1. AbstractThe vast majority of cancers exhibit Somatic Copy Number Alterations (SCNAs)โ€”gains and losses of variable regions of DNA. SCNAs can shape the phenotype of cancer cells, e.g. by increasing their proliferation rates, removing tumor suppressor genes, or immortalizing cells. While many SCNAs are unique to a patient, certain recurring patterns emerge as a result of shared selectional constraints or common mutational processes. To discover such patterns in a robust way, the size of the dataset is essential, which necessitates combining SCNA profiles from different cohorts, a non-trivial task.To achieve this, we developed CNSistent, a Python package for imputation, filtering, consistent segmentation, feature extraction, and visualization of cancer copy number profiles from heterogeneous datasets. We demonstrate the utility of CNSistent by applying it to the publicly available TCGA, PCAWG, and TRACERx cohorts. We compare different segmentation and aggregation strategies on cancer type and subtype classification tasks using deep convolutional neural networks. We demonstrate an increase in accuracy over training on individual cohorts and efficient transfer learning between cohorts. Using integrated gradients we investigate lung cancer classification results, highlighting SOX2 amplifications as the dominant copy number alteration in lung squamous cell carcinoma.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf104), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Ellen Visscher

      The paper introduces a python package for imputation, filtering, segmentation, feature extraction and visualisation of CNA profiles. It explains some of the elements of the package, and then demonstrates how data from multiple cohorts can be processed and combined using the package preprocessing pipeline. The authors then use processed data from 3 different cohorts to perform cancer type prediction using a CNN. From this, they get an interesting result to find a biomarker that differentiates two different lung cancers. Throughout, they show visualisations using their package. The package itself seems well documented and designed to be used. There is some clarification required in the methods section specifically around the CNN training and the models therein. There is also one major question of whether all the preprocessing steps are actually required for the downstream CNN analysis. Overall, however, this is a well written manuscript, providing a useful software tool for further analysis of CNA data.

      Major comments: - CNN section- how are the segments decided- is it based on all the training data, or just data in a batch? - Throughout the results pertaining to figure 3A-C, you call it test accuracy- to be clear is this is based on your CV hold outs? This should be reworded everywhere to reflect this. As cross validation indicates, this is not a test set and is a validation set- which is also the way you use it. - Regarding the above, you have a comment saying: "the best test accuracy without cross-validation was 92.34%". Could you please clarify what you mean by this. Only in the CNN section do you describe your training approach, which does not mention a test or separate validation set. - It reads slightly unclearly- you have a section called "model transfer", but are you training 3 different models- one per dataset? You only have one figure for training results which suggests one dataset, but then you have this section called model transfer? - Re all the above, please dedicate a small subsection in methods making this clearer. Are there dedicated test sets? If your main results are for aggregated data, then what are you testing on to ensure generalisability? What is the point of training the 3 different models on 3 different datasets? Perhaps it would make more sense to hold one dataset out as your test set. In some ways, that is what the model transfer is showing, but it would be less confusing to clarify that aim instead of suddenly introducing 3 models. - If the CNN architecture is essentially the same as in Attique et. al., the performance is basically the same and they use only CNs a gene locations- how does this demonstrate that the preprocessing from CNSistent is necessary or advantageous for this task? Maybe having a result which combines CN calls naively over gene locations and comparing to this across the aggregate datasets would be a good way of comparing? I.e showing that preproccessing does offer an advantage when combining different datasets together? Also because this is what you argue in your abstract. For this analysis you would have to make sure you also compare across the same samples to differentiate between filtering/other preprocessing steps. - In Figure 3I, you say "notice the similarity of chromosome 3 pattern for the correctly classified LUSC samples (red) and the misclassified ones (orange)". This is confusing because the orange and red are not similar. In fact for this whole section, it seems that figure 3I does not align with what you are saying?

      Minor comments/errors: - Clarification on why CNSistent needs a reference genome if it's dealing with segments? How is this information used- is it just for the known gaps? - Your caption of Supplementary Figure 1 has a typo about a breakpoint at 16 instead of 14. - You do not explain how you use the knee pt to filter (i.e is it samples above/below the knee pt.) - Your CNN graphic is difficult to interpret and non-standard. - CNN section should clarify at the beginning what the input is and what the output is (i.e a prediction that a sample belongs to a particular cancer type) before explaining the architectural details. - Even though you control for class imbalance, some cancer types are so poorly represented it is unlikely a CNN could learn that, you do kind of mention this in the discussion, but maybe some sort of minimum threshold for inclusion would make sense. - For Fig2D you refer to it as GND, but the axes/title says hemizygosity-are these things equivalent? E.g could have 3-3, low hemizygosity but not diploid? Or if it's aggregated across the whole genome its assumed equivalent? - There is a grammatical error "Runtimes decreased in a near-linearly with the number of compute cores" - You make a comment that "We therefore suspect some TCGA lung cancers might be cases of co-occurring adeno and squamous carcinomas." This is a possibility but given pleiotropy of many phenotypes- it may also be that the biomarker is not always unique to squamous carcinomas.

      Suggestions/Nice to haves: - Maybe make it clearer inside the paper what visualisations come with CNSistent. Looking at the software documentation, there's obviously a lot of useful visualisations that come with that- and some of them you have used in Figure 3 for e.g. - Given there are more total CN callers, maybe good to mention somewhere how CNSistent would work for total CNs only. - You remove profiles that you say are uninformative, could you not include this and then just show how accuracy correlates with no. of break-pts (for e.g). In some ways one might think that there could be useful information in few alteration profiles- because those alterations might be more upstream/causal. - The aggregation step could maybe affect downstream analysis. I.e taking the average could introduce CNs that were never called. Even using min/max- this implies a constant copy number in that region, which may lose information- e.g if it is a functional region having two diff CNs across gene might imply non-functionality. Did you explore the effect of aggregation step? Perhaps taking a small enough resolution of segment types would account for this anyway.

    1. And if you havenโ€™t talked to the people youโ€™re trying to help, then how could you possibly know what their problems are, or how to help them with design?

      I really agree with the point that if you havenโ€™t talked to the people youโ€™re designing for, you canโ€™t truly understand their problems. Too often designers make assumptions about what users need, and this can lead to solutions that donโ€™t actually help or even create new problems. This reading reminded me that design isnโ€™t just about creativity or technical skillโ€”itโ€™s also about empathy, listening, and real engagement with the people you want to serve. It changes my perspective by showing me that good design requires not just observation, but active communication with users.

    2. Because everyoneโ€™s problems are personal and have different causes and consequences, there is no such thing as theย โ€œaverage userโ€77 Trufelman, A. (2016). On average.ย 99% Invisible. . Every single solution will meet some peopleโ€™s needs while failing to meet others.

      Agreed. I've once looked into developing a small widget tool on the phone to quickly convert currencies. While looking into potential usage cases, I quickly realized that it's just impossible to take into account everyone's consideration even for a feature that's so simple. With the cost of developing apps lowering everyday, I wonder if in the future, a lot more tools (at least on the software aspect) will become a lot more customizable to adapt to the user needs. Instead of logging into an app and being prompted with a list of options, the app asks you what you wish to accomplish and how you prefer to accomplish them, then provides you with a customized list of features. While this is also not a perfect solution due to technical complexity and potential learning curve, we might start to see a trend shift towards this direction.

    3. If youโ€™re clever, perhaps you can find a design thatโ€™s useful to a large, diverse group. But design will always require you to make a value judgement about who does and who does not deserve your design help. Let that choice be a just one, that centers peopleโ€™s actual needs. And let that choice be an equitable one, that focuses on people who actually need help (for example, rural Americans trying to access broadband internet, or children in low income families without computers trying to learn at home during a pandemicโ€”not urban technophiles who want a faster ride to work).

      I think I agree with this idea of trying to find a design that's useful to a large, diverse group. When designing something, it's usually safe to try to maximise the reach of design in order to make it usable by "most" people. However, I've realised that this "most" doesn't actually exist. Any design, regardless of how it's created, will always work for some people, and not others. I find this idea useful because it informs my belief that there is always an opportunity cost and hidden trade-off while trying to create the "best" design. People may try to generalise the design of their product so that it can be used by virtually anyone (universal design). However, this risks the possibility of losing the specificity or individuality of the design, or creating a design that is based on specific needs of a group. On the other hand, making a design so specific risks the lack of broad usability.

    1. Itโ€™s not that theyโ€™re dishonest; itโ€™s that theyโ€™re paralyzed. As one quiet young woman explained after class, nearly every syllabus now includes a warning: Use ChatGPT or similar tools, and youโ€™ll be reported to the academic deans. Nobody wants to risk it. Another student mentioned that a major A.I. site may even be blocked on the university network, though she was too nervous to test the rumor.

      Students perspective on AI is peculiar. I have had some teachers scare kids into not using It at all for any reason, insisting they will get caught and ruin their futures. I also have had teachers share other alternative AI sites that can be used to help us and not plagiarism. This leaves students confuse when AI becomes cheating and are scared to admit they use it even when they know they are not cheating. I have even seen kids write good essays and go back and dumb down their writing just to avoid being accused of AI.

    1. A persona is only useful if itโ€™sย valid. If these details are accurate with respect to the data from your research, then you can use personas as a tool for imagining how any of the design ideas might fit into a personโ€™s life. If you just make someone up and their details arenโ€™t grounded in someoneโ€™s reality, your persona will be useless, because what youโ€™re imagining will be fantasy.

      I agree with Koโ€™s point here that personas are only valuable if they are grounded in real data. Iโ€™ve seen how easy it is to make up a persona that sounds โ€œreal,โ€ but doesnโ€™t actually match peopleโ€™s lives. When that happens, the design feels off because itโ€™s built on assumptions instead of facts. I believe that "good design" isnโ€™t just about collecting data, but also about respecting and reflecting peopleโ€™s actual lives.

    2. A persona is only useful if itโ€™sย valid. If these details are accurate with respect to the data from your research, then you can use personas as a tool for imagining how any of the design ideas might fit into a personโ€™s life. If you just make someone up and their details arenโ€™t grounded in someoneโ€™s reality, your persona will be useless, because what youโ€™re imagining will be fantasy.

      I like this idea and concept of personas that professor Amy is talking about. To me, this made reflected and think back how important a person with proper credential make up for valid data if that make sense. Or in other words, its easy for us to use data that makes up generic user instead of important 'personas'. I know that using relevant and important data / personas is important but this does change my perspective a bit as I though persoans was another approach but now, I see persoans as something that should be taken highly into consideration as evidence for design.

    3. How do you turn a hundred little insights intoย knowledgeย that you can to inform your design process? And what form should that knowledge take?

      I actually think defining problems is one of the hardest parts of design, and this chapter, along with todayโ€™s lecture, made that sort of clear. We did an activity where we critiqued our peersโ€™ design paradigms, and I found it difficult to come up with possible problems or issues in their designs. In my experience, itโ€™s easy to jump straight to solutions without fully understanding who the problem is really affecting. I like the idea of using personas and scenarios, but sometimes they can oversimplify real people if youโ€™re not careful. Itโ€™s a good reminder that design needs to stay grounded in real experiences, not just fictional summaries of users.

    4. A persona is only useful if itโ€™sย valid. If these details are accurate with respect to the data from your research, then you can use personas as a tool for imagining how any of the design ideas might fit into a personโ€™s life.

      I like the ideas of personas because they put a face to the audience that you are trying to impact with the design of your product. It seems like they make it easier to empathize with users needs and remind you that youโ€™re designing for real people, not just abstract requirements. I've created a persona for my INFO 200 course, and it was interesting to put myself in the shoes of a user and their needs in the process.

    1. Itโ€™s not just common sense that tells us that learning to write ingeneral is not possible. Many studies of writing have been doneโ€”in workplaces, in classes across the college landscape, and in socialand civic settings.

      Its important when you're making claims about a topic to always include evidence to support your idea. Simply saying that "many studies" have been taken with no actual proof of said study to back this claim up makes it feel sort of empty and that we just have to take this claim at face value

    1. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CAT-tailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue.ย 

      Your acknowledgment of our study's pioneering elements is greatly appreciated.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. The conclusions of this paper are mostly well-supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We are grateful for your acknowledgment of our studyโ€™s innovative approach and its possible influence on cancer therapy. We sincerely appreciate your valuable feedback. In response, this updated manuscript presents substantial new findings that reinforce our central argument. Moreover, we have broadened our data analysis and interpretation, as well as refined our methodological descriptions.

      Reviewer #2 (Public Review):

      This work explores the connection between glioblastoma, mito-RQC, and msiCAT-tailing. They build upon previous work concluding that ATP5alpha is CAT-tailed and explore how CAT-tailing may affect cell physiology and sensitivity to chemotherapy. The authors conclude that when ATP5alpha is CAT-tailed, it either incorporates into the proton pump or aggregates and that these events dysregulate MPTP opening and mitochondrial membrane potential and that this regulates drug sensitivity. This work includes several intriguing and novel observations connecting cell physiology, RQC, and drug sensitivity. This is also the first time this reviewer has seen an investigation of how a CAT tail may specifically affect the function of a protein. However, some of the conclusions in this work are not well supported. This significantly weakens the work but can be addressed through further experiments or by weakening the text.

      We appreciate the recognition of our study's novelty. To address your concerns about our conclusions, we have revised the manuscript. This revision includes new data and corrections of identified issues. Our detailed responses to your specific points are outlined below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1B, please replace the high-exposure blots of ATP5 and COX with representative results. The current results are difficult to interpret clearly. Additionally, it would be helpful if the author could explain the nature of the two different bands in NEMF and ANKZF1. Did the authors also examine other RQC factors and mitochondrial ETC proteins? I'm also curious to understand why CAT-tailing is specific to C-I30, ATP5, and COX-V, and why the authors did not show the significance of COX-V.

      We appreciate your inquiry regarding the data.ย  Additional attempts were made using new patient-derived samples; however, these results did not improve upon the existing ATP5โบ, (NDUS3)C-I30, and COX4 signals presented in the figure.ย  This is possibly due to the fact that CAT-tail modified mitochondrial proteins represent only a small fraction of the total proteins in these cells.ย  It is acknowledged that the small tails visible above the prominent main bands are not particularly distinct. To address this, the revised version includes updated images to better illustrate the differences. We believe the assertion that GBM/GSCs possess CAT-tailed proteins is substantiated by a combination of subsequent experimental findings. The figure (refer to new Fig. 1B) serves primarily as an introduction. It is important to note that the CAT-tailed ATP5โบ plays a vital role in modulating mitochondrial potential and glioma phenotypes, a function which has been demonstrated through subsequent experiments.

      It is acknowledged that the CAT-tail modification is not exclusive to the ATP5โบprotein.ย  ATP5โบ was selected as the primary focus of this study due to its prevalence in mitochondria and its specific involvement in cancer development, as noted by Chang YW et al.ย  Future research will explore the possibility of CAT tails on other mitochondrial ETC proteins. Currently, NDUS3 (C-I30), ATP5โบ, and COX4 serve as examples confirming the existence of these modifications. It remains challenging to detect endogenous CAT-tailing, and bulk proteomics is not yet feasible for this purpose. COX4 is considered significant.ย  We hypothesize that CAT-tailed COX4 may function similarly to the previously studied C-I30 (Wu Z, et al), potentially causing substantial mitochondrial proteostasis stress. ย 

      Concerning RQC proteins, our blotting analysis of GBM cell lines now includes additional RQC-related factors. The primary, more prominent bands (indicated by arrowheads) are, in our assessment, the intended bands for NEMF and ANKZF1.ย  Subsequent blotting analyses showed only single bands for both ANKZF1 and NEMF, respectively. The additional, larger molecular weight band of NEMF, which was initially considered for property analysis (phosphorylation, ubiquitination, etc.), was not examined further as it did not appear in subsequent experiments (refer to new Fig. S1C).

      References:

      Chang YW, et al. Spatial and temporal dynamics of ATP synthase from mitochondria toward the cell surface. Communications biology. 2023;6(1).

      Wu Z, et al. MISTERMINATE Mechanistically Links Mitochondrial Dysfunction With Proteostasis Failure. Molecular cell. 2019;75(4).

      (2) In addition to Figure 1B, it would be interesting to explore CAT-tailed mETC proteins in cancer tissue samples.

      This is an excellent point, and we appreciate the question. We conducted staining for ATP5โบ and key RQC proteins in both tumor and normal mouse tissues. Notably, ATP5โบ in GBM exhibited a greater tendency to form clustered punctate patterns compared to normal brain tissue, and not all of it co-localized with the mitochondrial marker TOM20 (refer to new Fig. S3C-E). Crucially, we observed a significant increase in NEMF expression within mouse xenograft tumor tissues, alongside a decrease in ANKZF1 expression (refer to new Fig. S1A, B). These findings align with our observations in human samples.

      (3) Please knock down ATP5 in the patient's cells and check whether both the upper band and lower band of ATP5 have disappeared or not.

      This control was essential and has been executed now. To validate the antibody's specificity, siRNA knockdown was performed. The simultaneous elimination of both upper and lower bands upon siRNA treatment (refer to new Fig. S2A) confirms they represent genuine signals recognized by the antibody.

      (4) In Figure 1C and ID, add long exposure to spot aggregation and oligomer. Figure 1D, please add the blots where control and ATP5 are also shown in NHA and SF (similar to SVG and GSC827).

      New data are included in the revised manuscript to address the queries. Specifically, the new Fig 1D now displays the full queue as requested, featuring blots for Control, ATP5ฮฑ, AT3, and AT20. Our analysis reveals that AT20 aggregates exhibit higher expression and accumulation rates in GSC and SF cells.

      Fig. 1C has been updated to include experimental groups treated with cycloheximide and sgNEMF. Our results show that sgNEMF effectively inhibits CAT-tailing in GBM cell lines, whereas cycloheximide has no impact. After consulting with the Reporter's original creator and optimizing expression conditions, we observed no significant aggregates with ฮฒ-globin-non-stop protein, potentially due to the length of endogenous CAT-tail formation (as noted by Inada, 2020, in Cell Reports). Our analysis focused on the ratio of CAT-tailed (red box blots) and non-CAT-tailed proteins (green box blots). Comparing these ratios revealed that both anisomycin treatment and sgNEMF effectively hinder the CAT-tailing process, while cycloheximide has no effect.

      (5) In Figure 1E, please double-check the results with the figure legend. ATP5A aggregated should be shown endogenously. The number of aggregates shown in the bar graph is not represented in micrographs. Please replace the images. For Figure 1E, to confirm the ATP5-specific aggregates, it would be better if the authors would show endogenous immunostaining of C-130 and Cox-IV.

      Labels in Fig. 1E were corrected to reflect that the bar graph in Fig. 1F indicates the number of cells with aggregates, not the quantity of aggregates per cell. The presence of endogenous ATP5โบ is accurately shown. To address the specificity of ATP5โบ, immunostaining for endogenous NUDS3 was conducted. This revealed NUDS3 aggregation in GBM cells (SF and GSC) lacking TOM20, as demonstrated in the new Fig. S3A, B. These findings suggest NUDS3 also undergoes CAT-tailing modification, similar to ATP5โบ.

      (6) Figure 3A. Please add representative images in the anisomycin sections. It is difficult to address the difference.

      We appreciate your feedback. Upon re-examining the Calcein fluorescence intensity data in Fig. 3A, we believe the images accurately represent the statistical variations presented in Fig. 3B. To address your concerns more effectively, please specify which signals in Fig. 3A you find potentially misleading. We are prepared to revise or substitute those images accordingly.

      (7) Figure 3D. If NEMF is overexpressed, is the CAT-tailing of ATP 5 reversed?

      Thank you. Your prediction aligns with our findings. We've added data to the revised Fig. S6A, B, which demonstrates that both NEMF overexpression and ANKZF1 knockdown lead to elevated levels of CRC. This increase, however, was not statistically significant in GSC cells. A plausible explanation for this discrepancy is that the MPTP of GSC cells is already closed, thus any additional increase in CAT-tailing activity does not result in further amplification.

      (8) Figure 3G. Why on the BN page are AT20 aggregates not the same as shown in Figure 2E?

      We appreciate your inquiry regarding the ATP5โบ blots, specifically those in the original Fig. 3G (left) and 2E (right). Careful observation of the ATP5โบ band placement in these figures reveals a high degree of similarity. Notably, there are aggregates present at the top, and the diffuse signals extend downwards. Given that this is a gradient polyacrylamide native PAGE, the concentration diminishes towards the top. Consequently, the non-rigid nature of the Blue Native PAGE gel may lead to slight variations in the aggregate signals; however, the overall patterns are very much alike. To mitigate potential misinterpretations, we have rearranged the blot order in the new Fig. 3M.

      (9) Figure 4D. The amount of aggregation mediated by AT20 is more compared to AT3. Why are there no such drastic effects observed between AT3 and AT20 in the Tunnel assay?

      The previous Figure 4D presents the quantification of cell migration from the experiment depicted in Figure 4C. But this is a good point. TUNEL staining results are directly influenced by mitochondrial membrane potential and the state of mitochondrial permeability transition pores (MPTP), not by the degree of protein aggregation. Our previous experiments showed comparable effects of AT3 and AT20 on mitochondria (Fig. 2E, 3K), which aligns with the expected similar outcomes on TUNEL staining. As for its biological nature, this could be very complicated. We hope to explore it in future studies.

      (10) Figure 5C: The role of NEMF and ANKZF1 can be further clarified by conducting Annexin-PI assays using FACS. The inclusion of these additional data points will provide more robust evidence for CAT-tailing's role in cancer cells.

      In response to your suggestion, we have incorporated additional data into the revised version.

      Using the Annexin-PI kit, we labeled apoptotic cells and detected them using flow cytometry (FACS). Our findings indicate that anisomycin pretreatment, NEMF knockdown (sgNEMF), and ANZKF1 upregulation (oeANKZF1) significantly increase the rate of STS-induced apoptosis compared to the control group (refer to new Fig. S9D-G).

      (11) Figure 5F: STS is a known apoptosis inhibitor. Why it is not showing PARP cleavage?

      Also, cell death analysis would be more pronounced, if it could be shown at a later time point. What is the STS and Anisomycin at 24h or 48h time-point? Since PARP is cleaved, it would also be better if the authors could include caspase blots.

      I guess what you meant to say here is "Staurosporine is a protein kinase inhibitor that can induce apoptosis in multiple mammalian cell lines." Our study observed PARP cleavage even in GSCs, which are typically more resistant to staurosporine-induced apoptosis (C-PARP in Fig. S9B). The ratio of C-PARP to total PARP increased. We selected a 180-minute treatment duration because longer treatments with STS + anisomycin led to a late stage of apoptosis and non-specific protein degradation (e.g., at 24 or 48 hours), making PARP comparisons less meaningful. Following your suggestion, we also examined caspase 3/7 activity in GSC cells treated with DMSO, CHX, and anisomycin. We found that anisomycin treatment also activated caspases (Fig. S9A).

      (12) In Figure 5, the addition of an explanation, how CAT-tailing can induce cell death, would add more information such as BAX-BCL2 ratio, and cytochrome-c release from the mitochondria.

      Thank you for your suggestion. In this study, we state that specific CAT-tails inhibit GSC cell death/apoptosis rather than inducing it. Therefore, we do not expect that examining BAX-BCL2 and mitochondrial cytochrome c release would offer additional insights.

      (13) To confirm the STS resistance, it would be better if the author could do the experiments in the STS-resistant cell line and then perform the Anisomycin experiments.

      Thank you. We should emphasize that our data primarily originates from GSC cells. These cells already exhibit STS-resistance when compared to the control cells (Fig. S8A-C).

      (14) It would be more advantageous if the author could show ATP5 CATailed status under standard chemotherapy conditions in either cell lines or in vivo conditions.

      This is an interesting question. It's worth exploring this question; however, GSC cells exhibit strong resistance to standard chemotherapy treatments like temozolomide (TMZ).

      Additionally, we couldn't detect changes in CAT-tailed ATP5โบ and thus did not include that data.

      (15) In vivo (cancer mouse model or cancer fly model) data will add more weight to the story.

      We appreciate your intriguing question. An effective approach would be to test the RQC pathway's function using the Drosophila Notch overexpression-induced brain tumor model. However, Khaket et al. have conducted similar studies, stating, "The RNAi of Clbn, VCP, and Listerin (Ltn), homologs of key components of the yeast RQC machinery, all attenuated NSC over-proliferation induced by Notch OE (Figs. 5A and S5Aโ€“D, G)." This data supports our theory, and we have incorporated it into the Discussion. While the mouse model more closely resembles the clinical setting, it is not covered by our current IACUC proposal. We intend to verify this hypothesis in a future study.

      Reference:

      Khaket TP, Rimal S, Wang X, Bhurtel S, Wu YC, Lu B. Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS Nexus. 2024 Aug 13;3(8):pgae321.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1B, C: To demonstrate that Globin, ATP5alpha, and C-130 are CAT-tailed, it is necessary to show that the high mobility band disappears after NEMF deletion or mutagenesis of the NFACT domain of NEMF. This can be done in a cell line. The anisomycin experiment is not convincing because the intensity of the bands drops and because no control is done to show that the effects are not due to translation inhibition (e.g. cycloheximide, which inhibits translation but not CAT tailing). Establishing ATP5alpha as a bonafide RQC substrate and CAT-tailed protein is critical to the relevance of the rest of the paper.

      Thank you for suggesting this crucial control experiment.

      To confirm the observed signal is indeed a bona fide CAT-tail, it's essential to demonstrate that NEMF is necessary for the CAT-tailing process. We have incorporated data from NEMF knockdown (sgNEMF) and cycloheximide treatment into the revised manuscript. Our findings show that both sgNEMF and anisomycin treatment effectively inhibit the formation of CAT-tailing signals on the reporter protein (Fig. 1C). Similarly, NEMF knockdown in a GSC cell line also effectively eliminated CAT-tails on overexpressed ATP5โบ (Fig. S2B).

      In general, the text should be weakened to reflect that conclusions were largely gleaned from artificial CAT tails made of AT repeats rather than endogenously CAT-tailed ATP5alpha. CAT tails could have other sequences or be made of pure alanine, as has been suggested by some studies.

      Thank you for your reminder. We have reviewed the recent studies by Khan et al. and Chang et al., and we found their analysis of CAT tail components to be highly insightful. We concur with your suggestion regarding the design of the CAT tail sequence. We aimed to design a tail that maintained stability and resisted rapid degradation, regardless of its length. In the revised version, we clarify that our conclusions are based on artificial CAT tails, specifically those composed of AT repeat sequences (p. 9). We acknowledge that the presence of other sequence components may lead to different outcomes (p. 19).

      Reference:

      Khan D, Vinayak AA, Sitron CS, Brandman O. Mechanochemical forces regulate the composition and fate of stalled nascent chains. bioRxiv [Preprint]. 2024 Oct 14:2024.08.02.606406. Chang WD, Yoon MJ, Yeo KH, Choe YJ. Threonine-rich carboxyl-terminal extension drives aggregation of stalled polypeptides. Mol Cell. 2024 Nov 21;84(22):4334-4349.e7.ย 

      Throughout the work (e.g. 3B, C), anisomycin effects should be compared to those with cycloheximide to observe if the effects are specific to a CAT tail inhibitor rather than a translation inhibitor.

      We agree that including cycloheximide control experiments is crucial. The revised version now incorporates new data, as depicted in Fig. S5A, B, illustrating alterations in the on/off state of MPTP following cycloheximide treatment. Furthermore, Fig. S6A, B present changes in Calcium Retention Capacity (CRC) under cycloheximide treatment. The consistency of results across these experiments, despite cycloheximide treatment, suggests that anisomycin's role is specifically as a CAT tail inhibitor, rather than a translation inhibitor.

      Line 110, it is unclear what "short-tailed ATP5" is. Do you mean ATP5alpha-AT3? If so this needs to be introduced properly. Line 132: should say "may indicate accumulation of CAT-tailed protein" rather than "imply".

      We acknowledge your points. We have clarified that the "short-tailed ATP5ฮฑ" refers to ATP5ฮฑ-AT3 and incorporated the requested changes into the revised manuscript.

      Figure 1C: how big are those potential CAT-tails (need to be verified as mentioned earlier)?

      They look gigantic. Include a ladder.

      In the revised Fig. 1D, molecular weight markers have been included to denote signal sizes. The aggregates in the previous Fig. 1C, also present in the control plasmid, are likely a result of signal overexposure. The CAT-tailed protein is observed just above the intended band in these blots. These aggregates have been re-presented in the updated figures, and their signal intensities quantified.

      Line 170: "indicating that GBM cells have more capability to deal with protein aggregation".

      This logic is unclear. Please explain.

      We appreciate your question and have thoroughly re-evaluated our conclusion. We offer several potential explanations for the data presented in Fig. 1D: (1) ATP5ฮฑ-AT20 may demonstrate superior stability. (2) GSC (GBM) cells might lack adequate mechanisms to monitor protein accumulation. (3) GSC (GBM) cells could possess an increased adaptive capacity to the toxicity arising from protein accumulation. This discussion has been incorporated into the revised manuscript (lines 166-169).

      Line 177: how do you know the endogenous ATP5alpha forms aggregates due to CAT-tailing? Need to measure in a NEMF hypomorph.

      We understand your concern and have addressed it. Revised Fig. 3G, H demonstrates that a reduction in NEMF levels, achieved through sgNEMF in GSC cells, significantly diminishes ATP5ฮฑ aggregation. This, in conjunction with the Anisomycin treatment data presented in revised Fig. 3E, F, confirms the substantial impact of the CAT-tailing process on this aggregation.

      Line 218: really need a cycloheximide or NEMF hypomorph control to show this specific to CAT-tailing.

      We have revised the manuscript to include data from sgNEMF and cycloheximide treatments, specifically Fig. 3G, H, and Fig. S5C, D, as detailed in our response above.

      Lines 249,266, Figure 5A: The mentioned experiments would benefit from controls including an extension of ATP5alpha that was not alanine and threonine, perhaps a gly-ser linker, as well as an NEMF hypomorph.

      We sincerely appreciate your insightful comments. In response, the revised manuscript now incorporates control data for ATP5ฮฑ featuring a poly-glycine-serine (GS) tail. This data is specifically presented in Figs. S2E-G, S4E, S7A, D, E, and S8F, G. Our experimental findings consistently demonstrate that the overexpression of ATP5ฮฑ, when modified with GS tails, had no discernible impact on protein aggregation, mitochondrial membrane potential, GSC cell mobility, or any other indicators assessed in our study.

      Figure S5A should be part of the main figures and not in the supplement.

      This has been moved to the main figure (Fig. 5C).

  3. drive.google.com drive.google.com
    1. I didn't tell her that. Maybe I just don't understand poetry.I admit it's not the first thing I reach for when I pick upsomething to read

      One might expect him to feel curiosity, or strong emotion reading about it , but he responds with self-reflection. H is emotional is unexpected to me

    1. Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

    1. Weโ€™re curious to see how this phrase is continued to be used, and how these sentiments are continuing, being rejected, or evolving.

      I think itโ€™s really interesting how phrases online donโ€™t just disappearโ€”they get recycled in new ways. Sometimes a phrase that was once offensive or harmful gets turned into a meme, while other times people reject it completely. Iโ€™ve noticed on platforms like TikTok, older phrases resurface with totally new meanings, and it makes me wonder if the internet ever truly โ€œlets goโ€ of language, or just keeps reshaping it.

    1. The Eyes aroundย โ€“ย had wrung them dryย โ€“ย  And Breaths were gathering firm For that last Onsetย โ€“ย when the King Be witnessedย โ€“ย in the Roomย โ€“

      Dickinson uses metaphor and imagery to heighten the emotional intensity of the moment just before death. The phrase โ€œThe Eyes around had wrung them dry โ€“โ€ suggests that the people witnessing the death have cried so much they have no tears left. It's a metaphor for emotional exhaustion and grief. The next lines โ€œBreaths were gathering firm / For that last Onsetโ€ build tension, as if everyone is holding their breath in anticipation of death's final moment. The phrase โ€œwhen the King / Be witnessed โ€“ in the Room โ€“โ€ is especially powerful. The โ€œKingโ€ could be a metaphor for death, but it might also refer to God or a divine presence entering the space. Dickinson keeps it ambiguous, which adds to the mystery and emotional weight of the scene. She shows how people expect something profound or sacred at the moment of death yet, as we later learn in the poem, itโ€™s interrupted by something as mundane as a fly.

    1. Present Progressive The present progressive (present continuous) tense is formed with a present form of be (i.c., am, is, or are) and the present participle of the main verb. Basic Meaning The basic meaning of the present progressive, taught in every English language teaching textbook, is ongoing action at the time of speaking. Time adverbs such as right now emphasize the immediacy of the ongoing action, as in (29a), which has an activity verb, and (29b), which has an achievement verb. Ongoing action can be transpiring over a longer period, as the time expression in (29c) illustrates. (29) a. They're studying for a midterm right now. b. Her plane is landing right now. c. They're putting the plan into effect in the course of this semester. Punctual achievement verbs such as bang, bounce, hit, and kick take on an iterative meaning in the present progressive, as illustrated in (30). (30) a. That window shutter is banging against the wall. You'd better secure it. b. He's bouncing the tennis ball off the backboard. Additional Meanings In addition to expressing ongoing action, the present progressive can express a number of other meanings. As we have seen, one of these is a future event that is planned, as

      annotation:The present progressive tense (also called present continuous) is used to talk about actions that are happening right now or around this time. Itโ€™s formed using am, is, are + a verb ending in ing. For example, we say, โ€œShe is eating lunchโ€ or โ€œThey are studying for a test.โ€ These sentences show that the actions are happening at the moment of speaking. Sometimes, the action can last over a longer period, for example, โ€œHe is working on a big project this month,โ€ which means he may not be doing it right now, but itโ€™s happening during this time. The present progressive can also describe actions that happen again and again, like โ€œThe baby is cryingโ€ or โ€œThe ball is bouncing,โ€ showing repeated movement. Finally, this tense is often used to talk about plans that are already arranged. For instance, โ€œIโ€™m meeting my friend tomorrowโ€ or โ€œWeโ€™re going to the zoo on Saturdayโ€ show future events that are planned. So, the present progressive isnโ€™t just for whatโ€™s happening now; it can also talk about ongoing, repeated, or future planned actions.

    1. While we werenโ€™t watching,persuasion became industrialized. In the twentieth century themodern advertising industry came to maturity and began systematic-ally applying new knowledge about human psychology and decisionmaking.

      I feel like it's worse now that persuasion became industrialized. It means that persuasion will become our every day life. Now with persuasion becoming more prominent in our life, it is harder to not become influenced with everything. Today, it is very easy for people to become influenced with just a tap of a button. From trends like Labubu and everyone singing and watching the same thing.

  4. blog.richmond.edu blog.richmond.edu
    1. (particularly in work focusing, respectively, onreality TV and serial drama), in terms of the wider viewthat Williams took of โ€œwatching televisionโ€ as part of alarger media system, the parameters and qualities of par-ticular forms and discourses matter much less than theextent and functioning of the system itself.

      William is saying that the system of TV the schedule, the format, how it flows is more important than any one show or genre. It's not just what we watch, but how it's organized that really shapes the experience of television.

  5. drive.google.com drive.google.com
    1. On the other hand, as Jim Collinsnotes in his chapter, highbrow critics are motivated to place Twin Peaks ina separate category because they feel called to police the boundaries between "art" and "trash;" and they want to claim that Twin Peaks is art.

      Critics separating Twin Peaks into an "art" category shows how genre isn't just about content, but also status. This reinforces how genre can be used to signal taste and social class. It's not about what something is, but who it's for.

    1. A 1955 survey showed that while most women worked for financial reasons, 21 percentworked to fulfill โ€œa need for accomplishmentโ€

      I think it's important this survey showed women wanted more than just financial security they wanted purpose. This relates to how today people want jobs that feel meaningful, not just to pay the bills.

    2. A 1955 survey showed that while most women worked for financial reasons, 21 percentworked to fulfill โ€œa need for accomplishmentโ€

      I think it's important this survey showed women wanted more than just financial security they wanted purpose. This relates to how today people want jobs that feel meaningful, not just to pay the bills.

    3. ๎‰is meant that the contradiction between unity anddivision was not a simple binary opposition; it was not a ma๎€ผer of either/orbut rather both at once.

      This means that the idea of unity and division is not just two opposite things, instead they can both exist together at the same time. Itโ€™s not a yes-or-no situation, but more about how they can happen together.

    1. hen I was an undergraduate, I didnโ€™t have a clue about design. Like most students in technical fields, I thought design was about colors, fonts, layout, and other low-level visual details. I knew enough about user interfaces to know that designย mattered, I just didnโ€™t know how much or why

      I really relate to this part because I used to think the exact same thing. The fact that design was mostly about visuals like color and layout. Iโ€™ve realized, though, that design goes much deeper and is really about how people interact with and experience a product. This changed my perspective because it made me see that good design isnโ€™t just decoration, itโ€™s about solving problems and making technology more usable and meaningful.

    2. Quickly I learned that design was much, much more than what was visible. Design was where ideas came from. Design was methods for generating ideas. It was methods for evaluating ideas. It was ways of communicating ideas. I learned that design wasย problem solving44 Jonassen, D. H. (2000). Toward a design theory of problem solving.ย Educational Technology Research and Development. , and that it is design problem solving that shapes the world. After all, look around you: nearly everything in the space youโ€™re reading this in is designed by someone, somewhere, to solve some problem.

      I really liked this part because it made me see that design isnโ€™t just about looks, itโ€™s about solving problems and getting ideas across. I really agree with Ko that basically everything around us is designed for a reason and thinking about it like that makes me notice all the effort behind everyday stuff. This reading changed my perspective on design by showing that itโ€™s a creative and structured process and it makes me appreciate more about design and how it shapes everything around us.

    3. Suddenly I was surrounded by designers, and takingย design classesย with design students in design studios.

      I found this reading really eye-opening because it made me realize that design is far more than just visuals or layoutsโ€”itโ€™s a way of thinking and problem-solving that shapes almost everything around us. I agree with the point that everyone designs in some way, whether rearranging a room or making a poster, because it made me see how often I engage in design without formal training. For me personally, when I was working on a book startup, I designed the app logo & the UI. After working on it though, I truly realized how design isn't only about colors & shapes but it must be intentional. I also found the discussion about design power and inclusion really useful, especially the idea from design justice that even well-intentioned designers can unintentionally exclude people, which makes me think about how important it is to involve diverse perspectives in any design process.

    4. Exploiting failure. Most people avoid and hide failure; designers learn from it, because behind every bad idea is a reason for itโ€™s failure that should be understood and integrated into your understanding of a problem.

      Wow I love this, this is so true and I agree with it completely. I can definitely relate to this because I interned at Microsoft this summer working on mkaing Github Copilot better for our codebase using things like MCP servers and Copilot instructions, and I always expected the AI to improve after adding more, but it just wouldn't. Instead of taking it as a failure, it became a learning experience for my team about AI and its unpredictability/lack of consistency.

    5. Quickly I learned that design was much, much more than what was visible. Design was where ideas came from. Design was methods for generating ideas. It was methods for evaluating ideas. It was ways of communicating ideas. I learned that design wasย problem solving44 Jonassen, D. H. (2000). Toward a design theory of problem solving.ย Educational Technology Research and Development. , and that it is design problem solving that shapes the world. After all, look around you: nearly everything in the space youโ€™re reading this in is designed by someone, somewhere, to solve some problem.

      I agree with this. This reaffirms my belief that design is a reiterative process rather than the surface level details of a product. Before, I used to think that design was just limited to visual design, meaning it mattered just how something looked or how aesthetically pleasing it was. However, I've looked into design concepts and processes such as the Design Thinking Process and now I view design multidimensionally. Design is not just about creating features, it's about empathizing with stakeholders, defining the problem, brainstorming for solutions, creating prototypes, and finally testing.

    6. I thought design was about colors, fonts, layout, and other low-level visual details. I knew enough about user interfaces to know that designย mattered, I just didnโ€™t know how much or why.

      I feel like this how many people see design, and the way I think I see design (at least subconsciously). I feel like the visual definition or idea of design is so much easier to grasp than the more looser definitions like the ones that we were talking about during class. If you see design as something that encompasses so many different things and that's done in so many different areas and industries, then it's like "what isn't design or a result of designing?" which is kind of confusing.

    1. For an example of how AI hallucinations can play out in the real world, consider the legal case of Mata v. Avianca. In this case, a New York attorney representing a clientโ€™s injury claim relied on ChatGPT to conduct his legal research.

      It is just like how those AI videos these days seem TOO real that it's become difficult to identify the authentic to AI work now. However, in this case, it is just more frightening reaching beyond the screens now, bold enough to use for a court trial.

    2. Why do people think Ai very helpful? when it's clearly hurting them physically it just wrong thing to do. That why teacher check if student used ai for their assignment air cant be trusted no more these days.

    1. What makes it mean that?

      If I take a calculator, type in '800-250' then it will also output 550. But the way I got to '550' doesn't make it mean Lucie has a low credit score. I just picked a random number and then subtracted another number. It matters how I worked my way to that number that it means Lucie has a low credit score, so how did the SmartCredit work it's way to 550? (Do you think how the output is made is always relevant to what it means? Or how the output is recieved? Or both?)

    1. I use AI a lot. Like, every day,โ€ she said. โ€œAnd I do believe it could take awaythat critical-thinking part. But itโ€™s just โ€” now that we rely on it, we canโ€™t really imagine living without it

      It is interesting how we are now growing up in an age where technology is totally embedded within our society. People have begun to use it as a crutch, something you depend on. I think this sort of mentality can be dangerous as it does remove the necessary analytical aspect of learning.

    2. itโ€™s just in your brain

      Is this implying that he wants to create an A.I. brain chip (something like Neuralink)??? Why is this guy so cartoonishly unethical

    1. There is actually a name for this condition: imposter syndrome. Students who feel like an imposter are worried that they donโ€™t belong, that someone will โ€œexpose them for being a fake.โ€ This feeling is pretty common for anyone who finds themselves in a new environment and is not sure if they have what it takes to succeed. Trust the professionals who work with first-year college students: you do have what it takes, and you will succeed. Just give yourself time to get adjusted to everything.

      This stands out to me because it is a feeling I've been having since starting school. I've been away from school since dropping out of high school when i was 16. It is reassuring to me to know that its not just me, that its normal I guess. it's saying its normal and I do have what it takes to succeed!

    2. It comes with its own language and customs, some of which can be confusing or confounding at first. Just like traveling to a foreign country, it is best if you prepare by learning what words mean and what you are expected to say and do in certain situations.

      I know it's just the start of the chapter put this specific part of the text really stood out to me. Coming to college feels outer worldly to me, I moved here recently so I pretty much know no one here so it feels extremely foreign to me. I feel as if this passage is important because there's many people entering a new scene like this so calling it out can give a spark of some sort showing that they're not the only person being introduced to this chapter in their life.

    1. The video gets a lot of things right: design is a way of thinking, a mindset, a form of optimistic approach to imagining better worlds. The video argues that it is something fundamentally human. But what makes designย good?

      I really agree with the articleโ€™s idea that design is โ€œfundamentally humanโ€ because it shows how creativity and problem-solving are part of everyday life, not just for professional designers. The question of what makes design good stood out to me, since it made me think about how design isnโ€™t just about functionalityโ€”itโ€™s also about considering people, contexts, and values. I found this perspective useful because it highlights that good design involves both imagination and responsibility, which changes how I think about the role of design in daily life. From my previous experience, I found that good designs always correlate with authenticity, purpose, but most of all functionality.

    2. One critique of human-centered design is that it narrowly focuses on people and their needs rather than a systems-level view of the activities that people engage in, and the multiple people and systems involved in those activities. For example, consider the activity of driving a bus: itโ€™s not just the driver that matters, but the dispatchers that communicate information to drivers, the other drivers on the road, and even the riders occasionally.

      I think this is a very interesting aspect when it comes to design. This dilemma that is created when the creator is balancing their design between the human user and the system (or potentially other human users) behind the scenes really made me think about multiple things that I thought were poorly designed. Maybe some things were designed the way they are for a reason (which not necessarily means it was just poorly designed).

    3. These and other critiques lead to a notion ofย participatory designย 1010 Muller, M. J., & Kuhn, S. (1993). Participatory design.ย Communications of the ACM. , in which designers not only try to understand the problems of stakeholders, but recruiting stakeholders onto the design team as full participants of a design process. This way, the people youโ€™re designing for areย alwaysย represented throughout the design process.

      I really agree with participatory design because it's so much more effective than just assuming what people's problems are. Actually involving stakeholders in the process makes sure their needs are truly represented instead of just guessing. It feels like the best way to design solutions that will actually help the people using them, but I also understand that not everyone's issues with always be fully met and I wonder if we can come up with a technique some day that solves this problem.

    4. For example, consider the activity of driving a bus: itโ€™s not just the driver that matters, but the dispatchers that communicate information to drivers, the other drivers on the road

      Relatable. When I was designing a course planning tool, I noticed in my design review that I hadn't really considered the needs of administrators or academic advisors, which are also crucial in the loop. You also see this kind of problem in a lot of existing tools where individual function might be very nicely designed, but it doesn't fit into the larger system. An example is Microsoft Tay, which failed to install proper safeguards, leading to extensive harmful exploitation.

    5. One of the most common in the world today isย human-centered design11 Bannon, L. (2011). Reimagining HCI: toward a more human-centered perspective.ย ACM interactions. ย (sometimes calledย user-centered design, but many people find the word โ€œuserโ€ to be too limiting).

      I agree with the idea of human-centered design because nowadays everything is designed to make people's lives easier. It makes sense to focus on peopleโ€™s needs first, since technology and design only matter if they improve the way we live and interact. This perspective also changes how I think about designโ€”itโ€™s not just about making something look good, but about creating solutions that actually fit into peopleโ€™s daily lives.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments and overall very supportive feedback.

      Reviewer #1 writes: "The study is very thorough and the experiments contain the appropriate controls. (...) The findings of the study can have relevance for human conditions involving disrupted mitochondrial dynamics, caused for example by mutations in mitofusins." Reviewer #2 writes: "The dataset is rich and the time-resolved approach strong." Reviewer #3 writes: "I admire the philosophy of the research, acknowledging an attempt to control for the many possible confounding influences. (...) This is a powerful and thoughtful study that provides a collection of new mechanistic insights into the link between physical and genetic properties of mitochondria in yeast."

      We address all points below. We have not yet updated our text and figures since we expect substantial additions from new experiments. But we have included Figure R1 with some additional analyses of existing data at the bottom of the manuscript.

      Reviewer1

      1.1 Statistical comparisons are missing throughout the manuscript (with the exception of Fig. 2c). Appropriate statistical tests, along with p-values, should be used and reported where different gorups are compared, for example (but not limited to) Fig. 3d and most panels of Fig. 4.

      We initially decided not to add too many extra labels to the already very busy plots, given that the magnitude of change mostly speaks for itself. However, we will try to find meaningful statistical tests together with a sensible graphical representation for all of the figures. For one example see Figure R1A.

      1.2. I do not agree with the use of Atp6 protein as a direct read-out of mtDNA content. While Atp6 protein levels will decrease with decreasing mtDNA content, the inverse is not necessarily true: decreased Atp6 protein levels do not necessarily indicate decreased mtDNA levels, because they could alternatively or additionally be caused by decreased transcription and/or translation. Therefore, please do not equate Atp6 protein levels to mtDNA levels, and instead rephrase the text referencing the Atp6 experiments in the Results and Discussion sections to measure "mtDNA expression" or "mt-encoded protein" or similar. For example, on p. 14 line 431 should read "mtDNA expression" rather than "decreased synthesis of mtDNA", and line 440 on the same page "mean mtDNA levels" should be "mtDNA expression" or similar.

      All three reviewers agree that using Atp6-NG as a direct proxy for mtDNA requires more validation, or at least rephrasing of the text. We agree that this is the most important point to address. We had previously tried using the mtDNA LacO array (Osman et al. 2015) to directly assess the amount of nucleoids per cell. However, the altered mitochondrial morphology of the Fzo1 depleted cells combined with the LacI-GFP which is still in mitochondria even when mtDNA is gone, increases the noise level to a point that we cannot interpret the signal. However, as this manuscript was in the submission process, the Schmoller lab (co-authors #2 and #7) adapted the HI-NESS system to label mtDNA in live yeast cells(Deng et al. 2025). This system promises much better signal to noise and we expect we can address all concerns regarding the actual count of nucleoids per cell. Should this unexpectedly fail for technical reasons, we will try to calibrate the Atp6-levels with DAPI staining at defined time points and will rephrase the text as the reviewer suggests.

      1.3. In Fig. 3, the authors use the fluorescence intensity of a mitochondrially-targeted mCardinal as a read-out of mitochondrial mass. Please provide evidence that this is not affected by MMP, either with relevant references or by control experiments (e.g. comparing it to N-acridine orange or other MMP-independent dyes or methods).

      Whether or not the import of any mitochondrial protein is dependent on the MMP depends largely on the signal sequence. The preSu9-signaling sequence was previously characterized as largely independent of the MMP compared to other presequences (Martin, Mahlke, and Pfanner 1991), which is why Vowinckel (Vowinckel et al. 2015) and others (Di Bartolomeo et al. 2020; Periฤ‡ et al. 2016; Ebert et al. 2025) have previously used this as a neutral reference to the strongly MMP-dependent pre-Cox4 signal to estimate MMP. As one control in our own data, we consider that the population-averaged mitochondrial fluorescent signal Figure S3C stays constant in the first few hours, in agreement with the total averaged mitochondrial proteome (Fig R1E). As additional controls, we plan to compare the signal to an MMP independent dye as the reviewer suggests.

      1.4. In Fig. 2e-f, the authors use a promoter reporter with Neongreen to answer whether the reduced levels of the nuclear-encoded mitochondrial proteins Mrps5 and Qcr7 are due to decreased expression or to protein degradation, and find no evidence of degradation of the Neongreen reporter protein. However, subcellular localization might affect the availability of the protein to proteases. Although not absolutely required, it would be relevant to know if the Neongreen fusion protein is found in the same subcellular compartment as Mrps5 and Qcr7 at 0h and 9h after Fzo1 depletion.

      Here, it seems we need to explain the set-up and interpretation of the data better. The key point we are trying to make with the promoter-Neongreen construct is that the regulation is not mainly at the level of transcription. We are showing that the reduction in the levels of the actual protein (orange bars) is not (mainly) explained by a reduction in expression, since the promoter is similarly active at 0 and at 9 hours (grey bars). If expression from the promoter were strongly reduced, the Neongreen would be diluted with growth and would also decrease, but this is not the case. The fluorophore itself is just floating around in the cytosol and is not subject to the same post-translational regulation as Mrps5 and Qcr7, so there is no reason to expect degradation.

      1.5. Fzo1 depletion leads to a very rapid drop in MMP during the first hour of depletion. In the Discussion, can the authors speculate on the possible mechanism of this rapid MMP drop that occurs well before mtDNA or mt-encoded proteins are decreased in level?

      This is indeed an interesting point. We think there are likely three reasons causing this initial drop: Firstly, due to the fragmentation the mixing of mitochondrial content is disturbed and smaller fragments may have suboptimal stoichiometry of components (see also (Khan et al. 2024) who look at this in detail including the Fzo1 deletion); secondly, already fairly early, some mitochondrial fragments may not contain any mtDNA and therefore will be unable to synthesize ETC proteins; thirdly, altered morphological features like changes in the surface-to-volume ratios may play a role. Sadly, mechanistically following up on this is not possible with the tools in our hands and therefore outside of the scope of this manuscript. But we are happy to include these speculations in our discussion.

      1.6. In Fig. 2a, the mtDNA copy number of Fzo1-depleted cells is ca 1.3-fold of the control cells at the 0h timepoint. Why might this be? Is it an impact of one of the inducers? If so, we might be looking at the combination of two different processes when measuring copy number: one that is an induction caused by the inducer(s), and the other a consequence of Fzo1 depletion itself.

      We believe that this 30% increase is within the noise of the experiment rather than an effect of the induction. Since we normalize to t=0 uninduced, the first black data point does not have error bars, emphasizing this difference. None of the protein data suggests that there is an increase in mtDNA encoded proteins (see e.g. 2B, or Atp6 fluorescence data). In the planned HI-NESS experiment, we will see in our single cell data whether there is an actual increase in mtDNA upon TIR induction. Additionally, we will run a qPCR to carefully determine mtDNA levels of untreated wild-type cells, tetracycline treated wild-type cells and tetracycline induced TIR expressing cells to exclude effects of tetracycline as well as the expression of TIR on mtDNA.

      Minor comments:

      1.7. p. 3, line 71: "ten thousands of dividing cells.." should be "tens of thousands of dividing cells".

      Thank you, will correct.

      1.8.-p.4, line 116: please be even more clear with what the "depleted" cells and controls are treated with: are depleted cells treated with both inducers, and controls with neither?

      We will make this more clear. Depleted cells are treated with both inducers, the control cells are not. However, in Figure 1A and in S1 we do controls to show that inducing TIR per se or adding aTC per se does not change growth rate or mitochondrial morphology. We will make this more clear.

      1.9. -p.5, lines 147-148: the authors write "the rate with which the abundance of Cox2 and Var1 proteins decreases was similar to the rate of mtDNA loss" though the actual rate is not shown. Please calculate and show rates for these processes side by side to make comparison possible, or alternatively rephrase the statement.

      Indeed this was not phrased well. We will call it dynamics rather than rates.

      1.10. -Fig. 2d: changing the y-axis numbering to match those in panels a and b would facilitate comparisons.

      Makes sense, we will change this.

      1.11. Fig. 2e: it is recommended to label the western blot panels to indicate what protein is being imaged in each (Neongree,, Mrps5, Qcr7).

      We will adapt the labelling to make it more clear.

      1.12. -p.9, line 262: I suggest referencing Fig. 4e at the end of the first sentence for clarity.

      We will modify the sentence as suggested.

      1.13. -In the sections related to Fig. 3a and Fig. 5a as well as the connected supplemental data, the authors discuss both the median and the mean of mitochondrial mass and Atp6 protein, respectively. For purposes of clarity, I suggest decreasing the focus on the mean (that is provided only in the supplemental data) and focusing the text mainly on the median. The two show differing trends and it is very good that both are shown, but the clarity of the text can be improved by focusing more on the median where possible.

      We will check the phrasing and simplify.

      1.14. -p. 14, line 435: the statement that mt mass is maintained over the first 9h of depletion is only true for the mean mt mass, not for the median. Please make this clear or rephrase.

      We will check phrasing, make it more clear and also point out the extended proteomics data (see Fig R1), which corresponds to the mean of the populations

      1.15.-p.14, line 452: "mitofusions" should be "mitofusins".

      Thanks for catching this.

      Reviewer 2:

      2.1. While inducible TIR is used to reduce background, the manuscript should rigorously exclude auxin/TIR off-targets (growth, mitochondrial phenotypes, gene expression). Please include full matched controls: (plus minus)auxin, (plus minus)TIR, epitope tag alone, and a degron control on an unrelated mitochondrial membrane protein.

      We agree that rigorous controls are crucial for the interpretation of the results. However, we think we have already included most of the controls the reviewer is asking for, but we might have not pointed this out clearly enough. For example, in Fig 1A, we could make it more clear by adding more labels in which samples we added aTC, which is only described in the figure legend.

      Here is a list of all the controls:

      • Each depletion experiment is always matched with an experiment of the same strain without induction. So the genetic background as well as effects such as light exposure, time spent in the microfluidics systems, etc are controlled for.
      • Figure S1D shows that the growth rate is wildtype like in a strain containing either the AID tag or the TIR protein AND upon addition of both chemicals. It also shows that the final genetic background (AID-tag and TIR) also grows like wildtype if the inducers are not added. This conclusively shows that neither the tags/constructs nor the chemicals per se affect growth rate
      • In Figure S1C we show the mitochondrial morphology of the same controls. We will make sure to label them more consistently to match panel D, and include an actual wildtype and a FLAG-AID-Fzo1 strain without TIR treated with both aTC and 5-Ph-IAA as direct comparison
      • In figure 1A we compare the Fzo1 protein levels of a strain with and without TIR. We show that in absence of TIR, adding either aTC or Auxin does not change Fzo1 levels and that the levels are comparable in the strain that is able to deplete Fzo1 directly before addition of 5-Ph-IAA (after 2 h of induction of TIR through addition of tetracycline)
      • Additionally, in Figure S2C we show that two hours after adding aTC, the entire proteome does not change significantly apart from a strong induction of TIR. We can also make this more clear in the figure legend.
      • Additionally, we will run a qPCR to carefully determine mtDNA levels of untreated wild-type cells, tetracycline treated wild-type cells and tetracycline induced TIR expressing cells to exclude effects of tetracycline as well as the expression of TIR on mtDNA. (also in response to 1.6.) In summary, we think we have controlled sufficiently for all confounding parameters and most importantly showed that addition of either aTC or Auxin as well as the FLAG-AID tag per se does not disturb mitochondria or cell growth. We do not see what a degron control on an unrelated protein will tell us. Depending on the nature of the protein, it may or may not have a phenotype that may or may not be related to morphology changes etc.

      2.2. The Mitoloc preSu9 vs Cox4 import ratio is only a proxy of mitochondrial membrane potential (ฮ”ฮจm) and itself depends on mitochondrial mass, protein expression, matrix ATP, and import saturation. The authors need to calibrate ฮ”ฮจm with orthogonal dyes (TMRE/TMRM) and pharmacologic titrations (FCCP/antimycin/oligomycin) to generate a response curve; show that Mitoloc tracks dye-based ฮ”ฮจm across the relevant range and corrects for mass/photobleaching. Report single-cell ฮ”ฮจm vs mass residuals.

      We completely agree that the MitoLoc system is only a rough proxy for the actual membrane potential. That is why we make no quantitative claims on the absolute value or absolute difference between groups of cells. We also make very clear in Fig 3B what we are actually measuring and can emphasize again in the text that this is only a proxy. We agree that it is a good idea to compare MitoLoc values to TMRE staining as the reviewer suggests, we will do these experiments in depleted and control cells at different timepoints. Please note though that also dye staining has its caveats, especially in dynamic live cell experiments. TMRM for example is not compatible with the acidic pH 5 medium that is typically used for yeast and subjecting cells to washing steps and higher pH may change both morphology of mitochondria and the MMP, especially in cells that are already โ€œstressedโ€. We prefer not to complete elaborate pharmacological titration experiments because firstly, this was extensively done in the original MitoLoc paper by the Ralser lab ((Vowinckel et al. 2015), cited 120 times); secondly, the value of the MMP is not the most critical claim of the manuscript. See also 3.12. Please note that in Figure S4D we had already plotted MMP vs mitochondrial concentration.

      2.3. To use Atp6-mNeon as a proxy for mtDNA is an assumption. Interpreting Atp6 intensity as "functional mtDNA" could be confounded by translation, turnover, or assembly. Please (i) report mtDNA copy number time courses (you have qPCR), nucleoid counts (DAPI/PicoGreen or TFAM/Abf2 tagging), and (ii) assess translation (e.g., 35S-labeling or puromycin proxies) and turnover (proteasome/AAA protease inhibition, mitophagy mutants -some data are alluded to- plus mRNA levels for mtDNA-encoded genes). This will support the "reduced synthesis" versus "increased degradation" conclusion.

      We agree with all three reviewers that Atp6 is only a proxy for mtDNA (Jakubke et al. 2021; Roussou et al. 2024) and the correlation should be checked more carefully. We will use the very recently established Hi-NESS system to follow nucleoids/ mtDNA during depletion experiments. See detailed reply to 1.2.

      (ii) in Figure 2C we inhibit mitochondrial translation and show that in this case control and depleted cells have the same level of Cox2, at least suggesting that degradation is not the key mechanism controlling the levels of mtDNA encoded proteins. We cannot do proteasome inhibitor assays since the nature of the AID-TIR systems requires an active proteasome. In figure S5C we show that the Atp6 depletion is similar in an atg32 deletion. This does not completely exclude a contribution of mitophagy to the observed phenotype, but does confirm that mitophagy is not the primary reason for cells becoming petite.

      2.4. The promoter-NeonGreen reporters argue against transcriptional down-regulation of nuclear OXPHOS. Please add mRNA (RT-qPCR/RNA-seq) for representative genes and a pulse-chase or degradation-pathway dependency (e.g., proteasome/mitophagy/autophagy mutants) to firmly assign active degradation. The authors need to normalize proteomics to mitochondrial mass (e.g., citrate synthase/porin) to separate organelle abundance from protein turnover.

      While we are happy to perform qPCR experiments for selected genes, a full RNA-seq experiment seems outside the scope of this study. As explained above, a proteasome inhibitor experiment is not possible in this set-up. Bulk mitophagy/autophagy seems unlikely to be the cause of the decrease of the nuclear-encoded OXPHOS proteins, since most other mitochondrial proteins do not decrease on average on population level in the first hours. This data is now plotted as additional figure (see below) and will be included in the supplementary of the revised manuscript (Fig R1E).

      2.5. Using preSu9-mCardinal intensity as "mitochondrial concentration" is sensitive to expression, import competence, and morphology/segmentation. The authors should provide validation that this metric tracks 3D volume across fragmentation states (e.g., correlation with mito-GFP volumetrics; detergent-free CS activity; TOMM20/Por1 immunoblot per cell).

      We agree that this is an important point and the co-authors discussed this point quite intensively. In figure S3A and B we show (using confocal data) that there is a very strong correlation between the total fluorescence signal and the 3D volume reconstruction. However, the slope of the correlation is different between tubular and fragmented mitochondria (compare panels A and B) and see figure legend. Since we are dealing with diffraction-limited objects it is likely that the 3D reconstruction is sensitive to morphology, especially if mitochondria are โ€œclumpingโ€. We therefore think that the total fluorescence signal is actually a better estimate of mitochondrial mass per cell than the 3D volume reconstruction (especially for our data obtained with a conventional epifluorescence microscope). The mean of the total mitochondrial fluorescence also better matches the population average mitochondrial proteome (Fig R1E). To consolidate this assumption, we will additionally compare our data to a strain with Tom70-Neongreen and to MMP independent dyes.

      Notably, since the morphology is similarly altered in mothers and buds this is of minor impact for our main point โ€“ the unequal distribution between mother and buds.

      2.6. The unequal mother-daughter distribution is compelling, but causality remains inferred. Test whether modulating inheritance machinery (actin cables/Myo2, Num1, Mmr1) or altering fission (Dnm1 inhibition) modifies segregation defects and rescues mtDNA/Atp6 decline. Complementation with Fzo1 re-expression at defined times would help order the phenotype cascade.

      We agree that rescue experiments would be very useful. We have some preliminary data for tether experiments, for example with Num1. The general problem is that the fragmented mitochondria clump together. We have not found a method to restore an equal distribution between mother and daughter cells. We will try to optimize the assay, but are not overly confident it will work. Mmr1 deletion aggravates the Fzo1 phenotype, likely also because the distribution becomes even more heterogeneous, but we have not rigorously analyzed this.

      We like the idea of the Fzo1 re-expression and will run such experiments. This will be especially powerful in combination with the new HI-NESS mtDNA reporter. We may be able to track exactly when cells reach the point-of-no return and become petite. This will also help connecting our mathematical model more directly to the data.

      2.7. The model is useful but should include parameter sensitivity (segregation variance, synthesis slopes, initial nucleoid number) and prospective validation (e.g., predict rescue upon partial restoration of synthesis or inheritance, then test experimentally).

      We will refine our model to include the to-be-measured nucleoids/mtDNA values. We will include a parameter sensitivity analysis with the updated model.

      Reviewer 3:

      3.1. About the use of Atp6 as a good proxy for mtDNA content. This is assumed from l285 onwards, based on a previous publication. As the link is fairly central to part of the paper's arguments, and the system in this study is being perturbed in several different ways, a stronger argument or demonstration that this link remains intact (and unchanged, as it is used in comparisons) would seem important.

      We agree, see 1.2.

      3.2. About confounding variables and processes. The study does an admirable job of being transparent and attempting to control for the many different influences involved in the physical-genetic link. But some remain less clearly unpacked, including some I think could be quite important. For example, there is a lot of focus on mito concentration -- but given the phenotypes are changing the sizes of cells, do concentration changes come from volume changes, mito changes, or both? In "ruling out" mitophagy -- a potentially important (and intuitive) influence, the argument is not presented as directly as it could be and it's not completely clear that it can in fact be ruled out in this way. There are a couple of other instances which I've put in the smaller points below.

      Thank you for acknowledging our efforts to show transparent and well-controlled experiments! We address each of the specific points below.

      3.3. full genus name when it first appears

      We will add the full name.

      3.4. I may be wrong here, but I thought the petite phenotype more classically arises from mtDNA deletion mutations, not loss? The way this is phrased implies that mtDNA loss is [always] the cause. Whether I'm wrong on that point or not, the petite phenotype should be described and referenced.

      We can expand the text and cite additional relevant papers. The term โ€œpetiteโ€ refers to any strain that is respiratory incompetent and leads to small colonies (not necessarily small cells!) (Seel et al. 2023). This can be mutations or gene loss (fragments) on the mtDNA (these are called cytoplasmic petite), or chemically induced loss of mtDNA (e.g. EtBr), or mutations of nuclear genes required for respiration (these are termed nuclear petite; some nuclear petites show loss of mtDNA in addition to the mutation in the nuclear genome) (Contamine and Picard 2000).

      3.5. para starting l59 -- should mention for context that mitochondria in (healthy, wildtype) yeast are generally much more fused than in other organisms

      ok.

      3.6. Fig 1C -- very odd choice of y-axis range! either start at zero or ensure that the data fill as much vertical space of the plot as possible

      True, this was probably some formatting relic. We will adapt the axis to fill the full space. Most of our axes start at 0, but that doesnโ€™t make so much sense here, since we consider the solidity in the control as โ€œbaselineโ€.

      3.7. "wild-type like more tubular mitochondria" reads rather awkwardly. "more tubular mitochondria (as in the wild-type)"?

      Thank you, sounds better.

      3.8. l106 -- imaging artefacts? are mitos fragmenting because of photo stress? -- this is mentioned in l577-8 in the Methods, but the data from the growth rate and MMP comparison isn't given -- an SI figure would be helpful here. It would be reassuring to know that mito morphology wasn't changing in response to phototoxicity too.

      In the methods we just briefly point out that we have done all our โ€œdue diligenceโ€ controls to check that we do not generate phototoxicity, something that we highlight in the cited review. We do not explicitly have a figure for this, but figure S1A shows that the solidity of the mitochondrial network in control cells stays the same over 9 hours, even though these cells are exposed to the same cultivation and imaging regime as the depleted cells. We will also add a picture of control cells after 9 h. In S1B we show that control cells containing TIR but no AID tag treated with both chemicals imaged over 9 hours also show the same solidity (~mitochondrial morphology) as untreated control. Also, the doubling times of cells grown in our imaging system (Fig R1B) are very similar to the shake flask (Fig R1A). All in all, we are very confident that our imaging settings did not impact our reported phenotypes.

      3.9. para l146 -- so this suggests mtDNA-encoded proteins have a very rapid turnover, O(hours) -- is this known/reasonable?

      Reference (Christiano et al. 2014) suggests that respiratory chain proteins are shorter lived than the average yeast protein. However, based on Figure 2C we think the dynamics mostly speak for a dilution by growth.

      3.10. section l189 -- it's hard to reason fully about these statistics of mitochondrial concentration given that the petite phenotype is fundamentally affecting overall cell volume. can we have details on the cell size distribution in parallel with these results? to put it another way -- how does mitochondrial *amount* per cell change?

      This is a good point. We report mostly on mitochondrial โ€œconcentrationsโ€ because we think this is what the cell actually cares about (mitochondrial activity in relationship to cytosolic activity). But we will include additional graphs on mitochondrial amount as well as size distributions (Fig R1C, related to Fig 4F). We can already point out that the size distribution of the population does not change much in the first hours. The โ€œpetiteโ€ phenotype refers to small colonies on growth medium with limited supply of a fermentable carbon source, not to smaller size of single cells.

      3.11. l199 the mean in Fig S3C certainly does change -- it increases, clearly relative both to control and to its initial value. rather than sweeping this under the carpet we should look in more detail to understand it (a consequence of the increased skew of the distribution)?

      This relates somewhat to the previous point. The increase in average concentration is not due to an increased amount in the population, but due to the fact that it is the small buds that get a very high amount of the mitochondria which โ€œexaggeratesโ€ the asymmetric/heterogenous distribution. This will be clarified by the figures we mention in the point above.

      3.12. para line 206 -- this doesn't make it clear whether your MMP signal is integrated over all mitochondria in the cell, or normalised by mitochondrial content? this matters quite a lot for the interpretation if the distributions of mitochondrial content are changing. reading on, this is even more important for para line 222. Reading further on, there is an equation on l612 that gives a definition, but it doesn't really clarify (apologies if I'm misunderstanding).

      For each cell, we basically calculate the relative mitochondrial enrichment of the MMP sensitive vs the MMP insensitive pre-sequence.

      So, MMP= (total intensity of mitochondrial pre-Cox4 Neongreen/ total intensity of mitochondrial pre-Su9 Cardinal) / (total cytosolic pre-Cox4 Neongreen/ total cytosolic pre-Su9 Cardinal).

      We calculate this value for each cell, but we do not have the optical resolution to calculate it for individual mitochondrial fragments.

      Both constructs are driven by the same strong promoter, so transcription of the fluorophore should never limit the uptake. Also, in Figure 3D we compare control and depleted cells with similar total mitochondrial concentration, so the difference must be due to a different import of the two fluorophores, see also Fig S4D. The calculated โ€œMMPโ€ value is of course only a crude proxy for the actual membrane potential in millivolts and we do not want to make any claims on absolute values or quantitative differences. But essentially what we are interested in is โ€œmitochondrial health/activityโ€ and we think the system is good at reporting this. See also 2.2.

      3.13. l230 -- a point of personal interest -- low mito concentrations are connected to low "function" (MMP) and give extended division times -- this is interestingly exactly the model needed to reproduce observations in HeLa cells (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002416). That model went on to predict several aspects of downstream cellular behaviour -- it would be very interesting to see how compatible that picture (parameterised using HeLa observations) is with yeast!

      Thank you for pointing out your interesting paper, which we will include in our discussion. Another recent preprint about fission yeast (Chacko et al. 2025) also fits into this picture. Since you were kind enough to disclose your identity, we would be happy to discuss this further with you in person if we can maybe follow-up on this.

      3.14. l239 "less mitochondria" -- a bit tricky but I'd say "fewer mitochondria" or "less mitochondrial content"

      Thanks, we will think about how to best rephrase this, probably less mitochondrial content.

      3.15. Section l234 So here (and in Fig 4) the focus is on overall distributions of mitochondrial concentration in different cells (mother-to-be, mother, bud; gen 1, gen >1). But we've just seen that one effect of fzo1 is to broader the distribution of mitochondrial concentration across cells. Can't we look in more depth at the implications of this heterogeneity? For example in Fig 4F (which is cool) we look at the distribution of all fzo1 mothers-to-be, mothers, and buds. But this loses information about the provenance. For example, do mothers-to-be with extremely low mito concentrations just push everything to the bud, while mothers-to-be with high mito concentrations distribute things more evenly? It would seem very easy and very interesting to somehow subset the distribution of mothers-to-be by concentration and see how different subsets behave

      This is a good point. When analyzing the data, we pretty much plotted everything against everything and then chose the graphs that we think will best guide the reader through the story-line. We can make additional supplementary plots where we show the starting concentrations/amounts of the mother in relationship to the resulting split ratio at the end of the cycle (Fig R1D).

      3.16. l285 -- experimental design -- do we know that Atp6 will continue to be a good proxy for functional mtDNA in the face of the perturbations provided by Fzo1 depletion? Especially if there is impact on the expression of mitoribosomes, the relationship between mtDNA and Atp6 may look rather different in the mutant?

      This is actually our top-priority experiment now. We will use the HI-NESS system and possibly DAPI staining to make a more direct link to mtDNA/ nucleoid numbers, see 1.2.

      3.17. l290 -- ruled out mitophagy. This message could be much clearer. Comparing Fig S5C and Fig 3A side-by-side is a needlessly difficult task -- put Fig 3A into Fig S5. Then we see that when mitophagy is compromised, the distribution of mitochondrial concentration has a lower median and much lower upper quartile than in the mitophagy-equipped Fzo1 mutant? What is going on here? For a paper motivated by disentangling coupled mechanisms, this should be made clearer!

      Thanks for pointing this out. We can of course easily include the control in the corresponding figure. Compromising mitophagy is likely to generally affect mitochondrial health and turnover a little bit, independent of what is going on with Fzo1. The second evidence that speaks against large-scale mitophagy is the proteomics data: On population level the dynamics of the respiratory chain proteins are very different from those of other (nuclear encoded) mitochondrial proteins. We will add additional supplementary figures to make this more clear, see Fig R1E. Most mitochondrial proteins in the proteomics experiment stay constant in the first few hours, consistent with the imaging data showing that the mean mitochondrial content of the population does not change initially. This again highlights that it is the unequal distribution which is the problem and not massive degradation of mitochondria.

      3.18. With the Atp6 signal, how do we know that fluorescence from different cells is comparable? Buds will be smaller than mother cells for example, potentially leading to less occlusion of the fluorescent signal by other content in the cytoplasm

      This is of course a general problem that anyone faces doing quantitative fluorescence microscopy. From the technical side, we have done the best we could by taking a reasonable amount of z-slices and by choosing fluorophores that are in a range with little cellular background fluorescence (e.g. Neongreen is much better than GFP). From a practical standpoint, we are always comparing to the control, which is subject to the same technical limitations as the depleted cells and the cell sizes are very similar. So, even if we are systematically overestimating the Atp6 concentration in the bud by a few %, the difference to the control would still be qualitatively true. We therefore do not think that any of our conclusions are affected by this.

      3.19. l343 -- maintenance of mtDNA -- here the point about l285 (is the Atp6-mtDNA relationship the same in the Fzo1 mutant) is particularly important, as we're directly tying findings about the protein product to implications about the mtDNA

      We will carefully address this, see above.

      3.20. l367 -- on a first read this description of the model feels like lots of choices have been made without being fully justified. Why a log-normal distribution (when the fit to the data looks rather flawed); why the choice of 5 groups for nucleoid number (why not 3? or 8?); the process used for parameter fitting is very unclear (after reading the methods I think some of these values are read directly from the data, but the shapes of the distributions remain unexplained). l705 -- presumably the ratio was drawn from a log-normal distribution and then the corresponding nucleoid numbers were rounded to integers? the ratio itself wasn't rounded? (also l367) How were the log-normal distributions fitted to experiments (Figs. S7A,B)? Just by eye?

      We will update our model based on measured nucleoid counts and then explain more stringently the choices we make/ parameters we select.

      3.21. l711 by random selection -- just at random? ("selection" could be confusing) Overall, it feels like the model may be too complicated for what it needs to show. Either (a) the model should show qualitatively that unequal inheritance and reduced production leads to rapid loss -- which a much simpler model, probably just involving a couple of lines of algebra, could show. Or (b) the model should quantitatively reproduce the particular numerical observations from the experiments -- it's not totally clear that it does this (do the cell-cycle-based decay timescales in Fig 7 correspond to the hour-based decay timescales in other plots, for example). At the moment the model is at a (b) level of detail but it's only clear that it's reporting the (a) level of results.

      If the HI-NESS and Fzo1 re-addition experiments work as explained above, all parameters will have direct experimental data, and we should get much closer to (a).

      3.22. A lot of the discussion repeats the results; depending on editorial preferences some of this text could probably be pared back to focus on the literature connections and context.

      We will think about streamlining the discussion once some of the additional material alluded to above has been added.

      3.23. Data availability -- it looks like much of the data required to reproduce the results is not going to be made available. Images and proteomic data are promised, but the data associated with mitochondrial concentration and other features are not mentioned. For FAIR purposes all the data (including statistics from analysis of the images) should be published.

      We maybe didnโ€™t phrase this clearly. All data will be made available. Where technically feasible, this will be directly accessible in a repository, otherwise by request to the corresponding author.

      On our OMERO server, we have deposited many TB of raw images as well as all the intermediate steps such as segmentation masks, and the csv files with all the extracted data for each cell (including background corrections etc). Additionally, we can include csvs with the data grouped in a way that we used to generate all the box blots etc. As of now, the OMERO data is unfortunately only available by requesting a personal guest login from our bioinformatics facility, but we were promised that with the next technical update there will be a public link available. The proteomics data and the model are already fully accessible. The raw western blot images with corresponding ponceau staining will be included with the final publication either as additional supplementary material or in whatever format matches the journal requirements.

      3.24 l660 -- can an overview of the EM protocol be given, to avoid having to buy the Mayer 2024 article?

      The cited paper is open access. But we can also include more details in our method section.

      References:

      Chacko, L. A., H. Nakaoka, R. Morris, W. Marshall, and V. Ananthanarayanan. 2025. 'Mitochondrial function regulates cell growth kinetics to actively maintain mitochondrial homeostasis', bioRxiv.

      Christiano, R., N. Nagaraj, F. Frohlich, and T. C. Walther. 2014. 'Global proteome turnover analyses of the Yeasts S. cerevisiae and S. pombe', Cell Rep, 9: 1959-65.

      Contamine, V., and M. Picard. 2000. 'Maintenance and integrity of the mitochondrial genome: a plethora of nuclear genes in the budding yeast', Microbiol Mol Biol Rev, 64: 281-315.

      Deng, Jingti, Lucy Swift, Mashiat Zaman, Fatemeh Shahhosseini, Abhishek Sharma, Daniela Bureik, Francesco Padovani, Alissa Benedikt, Amit Jaiswal, Craig Brideau, Savraj Grewal, Kurt M. Schmoller, Pina Colarusso, and Timothy E. Shutt. 2025. 'A novel genetic fluorescent reporter to visualize mitochondrial nucleoids', bioRxiv: 2023.10.23.563667.

      Di Bartolomeo, F., C. Malina, K. Campbell, M. Mormino, J. Fuchs, E. Vorontsov, C. M. Gustafsson, and J. Nielsen. 2020. 'Absolute yeast mitochondrial proteome quantification reveals trade-off between biosynthesis and energy generation during diauxic shift', Proc Natl Acad Sci U S A, 117: 7524-35.

      Ebert, A. C., N. L. Hepowit, T. A. Martinez, H. Vollmer, H. L. Singkhek, K. D. Frazier, S. A. Kantejeva, M. R. Patel, and J. A. MacGurn. 2025. 'Sphingolipid metabolism drives mitochondria remodeling during aging and oxidative stress', bioRxiv.

      Jakubke, C., R. Roussou, A. Maiser, C. Schug, F. Thoma, R. Bunk, D. Horl, H. Leonhardt, P. Walter, T. Klecker, and C. Osman. 2021. 'Cristae-dependent quality control of the mitochondrial genome', Sci Adv, 7: eabi8886.

      Khan, Abdul Haseeb, Xuefang Gu, Rutvik J. Patel, Prabha Chuphal, Matheus P. Viana, Aidan I. Brown, Brian M. Zid, and Tatsuhisa Tsuboi. 2024. 'Mitochondrial protein heterogeneity stems from the stochastic nature of co-translational protein targeting in cell senescence', Nature Communications, 15: 8274.

      Martin, J., K. Mahlke, and N. Pfanner. 1991. 'Role of an energized inner membrane in mitochondrial protein import. Delta psi drives the movement of presequences', J Biol Chem, 266: 18051-7.

      Osman, C., T. R. Noriega, V. Okreglak, J. C. Fung, and P. Walter. 2015. 'Integrity of the yeast mitochondrial genome, but not its distribution and inheritance, relies on mitochondrial fission and fusion', Proc Natl Acad Sci U S A, 112: E947-56.

      Periฤ‡, Matea, Peter Bou Dib, Sven Dennerlein, Marina Musa, Marina Rudan, Anita Lovriฤ‡, Andrea Nikoliฤ‡, Ana ล ariฤ‡, Sandra Soboฤanec, ลฝeljka Maฤak, Nuno Raimundo, and Anita Kriลกko. 2016. 'Crosstalk between cellular compartments protects against proteotoxicity and extends lifespan', Scientific Reports, 6: 28751.

      Roussou, Rodaria, Dirk Metzler, Francesco Padovani, Felix Thoma, Rebecca Schwarz, Boris Shraiman, Kurt M. Schmoller, and Christof Osman. 2024. 'Real-time assessment of mitochondrial DNA heteroplasmy dynamics at the single-cell level', The EMBO Journal, 43: 5340-59-59.

      Seel, A., F. Padovani, M. Mayer, A. Finster, D. Bureik, F. Thoma, C. Osman, T. Klecker, and K. M. Schmoller. 2023. 'Regulation with cell size ensures mitochondrial DNA homeostasis during cell growth', Nat Struct Mol Biol, 30: 1549-60.

      Vowinckel, J., J. Hartl, R. Butler, and M. Ralser. 2015. 'MitoLoc: A method for the simultaneous quantification of mitochondrial network morphology and membrane potential in single cells', Mitochondrion, 24: 77-86.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This article addresses the connection between perturbed mitochondrial structure and genetics in yeast. When mitochondrial fusion is compromised, what is the chain of causality -- the mechanism -- that leads to mtDNA populations becoming depleted? This is a fascinating question, linking physical cell biology to population genetics. I admire the philosophy of the research, acknowledging and attempt to control for the many possible confounding influences. The manuscript describes the context and the research tightly and digestibly; the figures illustrate the results in a clear and natural way.

      For transparency, I am Iain Johnston and I am happy for this review to be treated as public domain. To my eyes my most important shortcoming as a review is my relative lack of familiarity with the yeast fzo1 mutant; while I am familiar with analysis of yeast mito morphology and mtDNA segregation, a reviewer familiar with the nuances of this strain and its culture would be a useful complement.

      I have a few more general points and a collection of smaller points below that I believe might help make the story more robust.

      General points

      1. About the use of Atp6 as a good proxy for mtDNA content. This is assumed from l285 onwards, based on a previous publication. As the link is fairly central to part of the paper's arguments, and the system in this study is being perturbed in several different ways, a stronger argument or demonstration that this link remains intact (and unchanged, as it is used in comparisons) would seem important.
      2. About confounding variables and processes. The study does an admirable job of being transparent and attempting to control for the many different influences involved in the physical-genetic link. But some remain less clearly unpacked, including some I think could be quite important. For example, there is a lot of focus on mito concentration -- but given the phenotypes are changing the sizes of cells, do concentration changes come from volume changes, mito changes, or both? In "ruling out" mitophagy -- a potentially important (and intuitive) influence, the argument is not presented as directly as it could be and it's not completely clear that it can in fact be ruled out in this way. There are a couple of other instances which I've put in the smaller points below.

      Smaller points

      l47 full genus name when it first appears

      l58 I may be wrong here, but I thought the petite phenotype more classically arises from mtDNA deletion mutations, not loss? The way this is phrased implies that mtDNA loss is [always] the cause. Whether I'm wrong on that point or not, the petite phenotype should be described and referenced.

      para starting l59 -- should mention for context that mitochondria in (healthy, wildtype) yeast are generally much more fused than in other organisms

      Fig 1C -- very odd choice of y-axis range! either start at zero or ensure that the data fill as much vertical space of the plot as possible

      l105 "wild-type like more tubular mitochondria" reads rather awkwardly. "more tubular mitochondria (as in the wild-type)"?

      l106 -- imaging artefacts? are mitos fragmenting because of photo stress? -- this is mentioned in l577-8 in the Methods, but the data from the growth rate and MMP comparison isn't given -- an SI figure would be helpful here. It would be reassuring to know that mito morphology wasn't changing in response to phototoxicity too.

      para l146 -- so this suggests mtDNA-encoded proteins have a very rapid turnover, O(hours) -- is this known/reasonable?

      section l189 -- it's hard to reason fully about these statistics of mitochondrial concentration given that the petite phenotype is fundamentally affecting overall cell volume. can we have details on the cell size distribution in parallel with these results? to put it another way -- how does mitochondrial amount per cell change?

      l199 the mean in Fig S3C certainly does change -- it increases, clearly relative both to control and to its initial value. rather than sweeping this under the carpet we should look in more detail to understand it (a consequence of the increased skew of the distribution)?

      para line 206 -- this doesn't make it clear whether your MMP signal is integrated over all mitochondria in the cell, or normalised by mitochondrial content? this matters quite a lot for the intepretation if the distributions of mitochondrial content are changing. reading on, this is even more important for para line 222. Reading further on, there is an equation on l612 that gives a definition, but it doesn't really clarify (apologies if I'm misunderstanding).

      l230 -- a point of personal interest -- low mito concentrations are connected to low "function" (MMP) and give extended division times -- this is interestingly exactly the model needed to reproduce observations in HeLa cells (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002416). That model went on to predict several aspects of downstream cellular behaviour -- it would be very interesting to see how compatible that picture (parameterised using HeLa observations) is with yeast!

      l239 "less mitochondria" -- a bit tricky but I'd say "fewer mitochondria" or "less mitochondrial content"

      Section l234 So here (and in Fig 4) the focus is on overall distributions of mitochondrial concentration in different cells (mother-to-be, mother, bud; gen 1, gen >1). But we've just seen that one effect of fzo1 is to broader the distribution of mitochondrial concentration across cells. Can't we look in more depth at the implications of this heterogeneity? For example in Fig 4F (which is cool) we look at the distribution of all fzo1 mothers-to-be, mothers, and buds. But this loses information about the provenance. For example, do mothers-to-be with extremely low mito concentrations just push everything to the bud, while mothers-to-be with high mito concentrations distribute things more evenly? It would seem very easy and very interesting to somehow subset the distribution of mothers-to-be by concentration and see how different subsets behave

      l285 -- experimental design -- do we know that Atp6 will continue to be a good proxy for functional mtDNA in the face of the perturbations provided by Fzo1 depletion? Especially if there is impact on the expression of mitoribosomes, the relationship between mtDNA and Atp6 may look rather different in the mutant?

      l290 -- ruled out mitophagy. This message could be much clearer. Comparing Fig S5C and Fig 3A side-by-side is a needlessly difficult task -- put Fig 3A into Fig S5. Then we see that when mitophagy is compromised, the distribution of mitochondrial concentration has a lower median and much lower upper quartile than in the mitophagy-equipped Fzo1 mutant? What is going on here? For a paper motivated by disentagling coupled mechanisms, this should be made clearer!

      With the Atp6 signal, how do we know that fluorescence from different cells is comparable? Buds will be smaller than mother cells for example, potentially leading to less occlusion of the fluorescent signal by other content in the cytoplasm

      l336 -- similar to the Jajoo et al. mechanism in fission yeast -- but are you talking about feedback control of the mtDNA or the protein (or mRNA) product?

      l343 -- maintenance of mtDNA -- here the point about l285 (is the Atp6-mtDNA relationship the same in the Fzo1 mutant) is particularly important, as we're directly tying findings about the protein product to implications about the mtDNA

      l367 -- on a first read this description of the model feels like lots of choices have been made without being fully justified. Why a log-normal distribution (when the fit to the data looks rather flawed); why the choice of 5 groups for nucleoid number (why not 3? or 8?); the process used for parameter fitting is very unclear (after reading the methods I think some of these values are read directly from the data, but the shapes of the distributions remain unexplained). l705 -- presumably the ratio was drawn from a log-normal distribution and then the corresponding nucleoid numbers were rounded to integers? the ratio itself wasn't rounded? (also l367) How were the log-normal distributions fitted to experiments (Figs. S7A,B)? Just by eye? l711 by random selection -- just at random? ("selection" could be confusing) Overall, it feels like the model may be too complicated for what it needs to show. Either (a) the model should show qualitatively that unequal inheritance and reduced production leads to rapid loss -- which a much simpler model, probably just involving a couple of lines of algebra, could show. Or (b) the model should quantitatively reproduce the particular numerical observations from the experiments -- it's not totally clear that it does this (do the cell-cycle-based decay timescales in Fig 7 correspond to the hour-based decay timescales in other plots, for example). At the moment the model is at a (b) level of detail but it's only clear that it's reporting the (a) level of results.

      A lot of the discussion repeats the results; depending on editorial preferences some of this text could probably be pared back to focus on the literature connections and context.

      Data availability -- it looks like much of the data required to reproduce the results is not going to be made available. Images and proteomic data are promised, but the data associated with mitochondrial concentration and other features are not mentioned. For FAIR purposes all the data (including statistics from analysis of the images) should be published.

      l660 -- can an overview of the EM protocol be given, to avoid having to buy the Mayer 2024 article?

      Significance

      This is a powerful and thoughtful study that provides a collection of new mechanistic insights into the link between physical and genetic properties of mitochondria in yeast. Cell biologists, geneticists, and the mitochondrial field will find this of potentially deep interest. Because of the mode and dynamics of inheritance in budding yeast, findings here may not be directly transferrable to other eukaryotes, but these insights are still of interest for researchers outside of yeast for their insight into how this well-studied system manages its mitochondrial populations.

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes in resistance gene profiles, virulence genes, cell wall biogenesis, and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      The work would further benefit from a more detailed discussion on the limitations due to the lack of data on patient clinical information, ward movement, and swabs collected from healthcare workers to verify the transmission of Pseudomonas aeruginosa ST 621, including potential healthcare worker to patient transmission, patient-to-patient transmission, patient-to-environment transmission, and environment-to-patient transmission. For instance, the definition given in the manuscript for patient-to-patient transmission could not rule out the possibility of the existence of a shared contaminated environment. Equally, as patients were not routinely swabbed, unobserved carriers of Pseudomonas aeruginosa ST 621 could not be identified and the possibility of misclassifying the environment-to-patient transmissions could not be ruled out. Moreover, reporting of changes in rates of resistance to imipenem and cefepime could be improved by showing the exact p-values (perhaps with three decimal places) rather than dichotomising the value at 0.05. By doing so, readers could interpret the strength of the evidence of changes.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      We thank the reviewer for their thorough evaluation of our work, and for the suggested improvements. A main goal of this study was to show that integrating routine wgs in the clinic was a game changer for infection control efforts. We appreciate this aspect was highlighted as a strength by this reviewer. While some of the weaknesses identified are inherent to the data (or lack thereof) available for this study, we have revised the manuscript to include a detailed discussion on limitations (sampling, thresholds of genetic relatedness, definition and categories etc.) that could influence the genomic inferences. We also provided exact p-values for the changes in rates of resistance, as requested. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as the convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison.

      The sampling was based on all organisms sent to the Multidrug-resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating the inference of exact transmission routes. Epidemiological data relevant to this analysis include information on the reason for sampling, patient admission and discharge data, and underlying frequency of sampling and sampling results in relation to patient turnover.

      We thank the reviewer for their thoughtful feedback on our manuscript and for highlighting the quality of the genomic analyses. We agree that the lack of patient epi data (e.g. date of admission and discharge) and the inconsistent sampling through the years are limitations of this study. We have revised the manuscript to acknowledge these limitations and discuss how not having this data complicates the inference of exact transmission routes. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems, and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study overall is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      We thank the reviewer for their evaluation of our work and for highlighting the broad implications of our findings regarding the application of real-time surveillance to suppress nosocomial transmission. We agree with the reviewer that fixed thresholds and definitions are imperfect to classify the origin of an infection. The design of this study (e.g. inconsistent sampling through time) was not conducive to provide a comprehensive/quantitative measurement of transmission routes. Thus, we decided to apply conservative thresholds of genetic relatedness and strict conditions (e.g. time between isolate collection, shared hospital location etc.) to favor specificity as our goal was simply to establish that cases of environmentto-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original fixed-thresholds predictions. This limitation is now discussed in the revised manuscript. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly including the addition of Figure S3.

      Reviewer #1 (Recommendations For The Authors):

      The definitions used on lines 391-396 are necessarily somewhat arbitrary, but it would be helpful to have a little bit more justification for the choices made, particularly for the definition of environmental involving the "3x the number of years they were separated". It seems a little hard to square this with the more relaxed 10 SNP cutoff for a patient-to-patient designation. Are there reasons for thinking SNP differences associated with environmental transmission should be smaller than for patient-to-patient, or is the aim here just to set the bar higher for assuming an environmental source? Because these definitions are quite arbitrary, there could also be some value in exploring the sensitivity of the results to these assumptions.

      Thank you. We agree with the reviewers that SNP thresholds, albeit necessarily, are arbitrary and that more discussion/justification was needed to put the genomic inferences in context. We have revised the manuscript to indicate that: 1/ the 10 SNP cutoff for a patient-to-patient designation was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward at the same time) needed to be established. 2/ the environment-to-patient definition was indeed set to be most conservative (nearly identical isolates in two patients from the same ward with no known temporal overlap for > 365 days). This was indeed done to favor high specificity as this inference relied solely on clinical isolates (i.e. the identical environmental strain in the patientenvironment-patient chain was not sampled). For these clinical isolates to have acquired no/very little mutation in that much time, no/low replication is expected and, although unsampled, we propose this most likely happened on hospital surfaces.

      While the term "core genome" should be familiar to most readers, "shell genome" and "cloud genome" are less widely known, and an explanation of what these terms mean here would be helpful.

      Thank you. We have revised the manuscript to define the core, shell, and cloud genomes as genes sets found in โ‰ฅ 99%, โ‰ฅ 95% and โ‰ฅ 15% of isolates, respectively.

      In the first paragraph of the discussion, it could be added that in many cases for clinically important Gram negatives short read sequencing alone will fail to detect transmission events as outbreaks can be driven by plasmid spread with only very limited clonal spread (see, for example, https://www.nature.com/articles/s41564-021-00879-y )

      Thank you. We agree this is an important/emerging aspect of surveillance. However, the goal of this discussion point was to explain why such a large outbreak was missed prior to implementing WGS (short read) surveillance. We feel that discussing โ€œplasmid outbreaksโ€ (which is not at play here, and relatively rare in P. aeruginosa compared to the Enterobacteriaceae) and the need for long read will distract from the narrative.ย 

      line 599 What does "Mock" mean here? Would it be more accurate to say it is a simplified floor plan?

      Thank you. โ€œMockโ€ was changed to โ€œsimplifiedโ€

      IPAC abbreviation is only used once - spelling it out in full would increase readability.

      Revised manuscript was edited as suggested.

      MHS is only used twice.

      Revised manuscript was edited to spell out Military Health System

      Line 364: full stop missing.

      Revised manuscript was edited as suggested.

      Line 401: Bayesian rather than bayesian.

      Revised manuscript was edited as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for giving me the opportunity to review this interesting manuscript.

      The conclusions of this paper are mostly well supported by the data presented, but epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes.

      Major issues:

      What was the baseline frequency of clinical and/or screening samples of Pseudomonas aeruginosa at the hospital? Neither Figure 1D nor Table S1 allows for differentiating between clinical and screening samples. Most isolates were cultured from clinical materials, and there is no information about the patients' length of stay and their respective sampling dates. Is there any possibility of finding out whether the samples were collected for clinical or screening purposes? Would it be possible to include the patients' admission data to determine whether the strains were imported into the hospital or related to a previous stay, e.g. among known carriers? Also, the issue of sampling dates vs. patient stay on the ward should be addressed, as there may be an overlap in patients' stay on the ward but no overlap in terms of sampling dates or even missing samples (missing links).

      We have revised the manuscript to address this important point: i) 16 isolates were from surveillance swabs and are labelled โ€œSurveillanceโ€ in Table S1. The remaining 237 were clinical isolates; ii) unfortunately, because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) and we can not calculate length of stay or better identify patient overlap. These limitations are now acknowledged in the discussion of the revised manuscript.

      In order to evaluate the extent of the outbreak, more epidemiological data would be useful What is the size of the hospital, what is the average patient turnover, and what is the average length of stay in ICU and non-ICU? Is there any specialization besides the military label?

      We have revised the manuscript to indicate that facility A is 425-bed medical center and is the only Level 1 trauma center in the Military Health System. Unfortunately, the data to calculate length of stay, throughout the years, in ICU and non-ICU, was not available to us. This limitation is now also acknowledged in the discussion.

      Perhaps the authors could attempwt to discuss the extent to which large outbreaks like these may be considered as part of unavoidable evolutionary processes within the hospital microbiome as opposed to accumulation and transmission of potentially harmful genes/clones, and differentiate between the putative community spread without any epidemiological links on the one hand, and hospital outbreaks that could be targeted by local infection prevention activities on the other hand.,

      We respectfully disagree with the suggestion that this large outbreak โ€œmay be considered as part of unavoidable evolutionary processes within the hospital microbiomeโ€ and should be opposed to โ€œtransmission of potentially harmful genes/clonesโ€. As a matter of fact, our data showed that infection control staff at Facility A responded with multiple interventions, including closing sinks, replacing tubing, and using foaming detergents. This resulted in slowing the spread of the ST621 outbreak with just 3 cases identified in 2022, 0 cases in 2023 and 1 case in 2024. This is now discussed in the revised manuscript.

      Page 5, lines 88-92 lines 101-104. It seems as if the outbreak was identified only by the means of genomic surveillance. This raises questions as to the rationale for sampling and sequencing, especially prior to 2020. Considering 11 cases per year between 2011 and 2016, one could assume such an outbreak would have been noticed without sequencing data.

      The MRSN was created in 2010, in response to the outbreak of MDR Acinetobacter baumannii in US military personnel returning from Iraq and Afghanistan. Between 2011 and 2017, the MRSN collected MDR isolates (mandate for all MDR ESKAPE but compliance varied between years and facilities) from across the Military Health System and, for select isolates (e.g. high-risk isolates carrying ESBLs or carbapenemases) performed molecular typing by PFGE. In 2017 the MRSN started to perform whole genome sequencing of its entire repository. In 2020, a routine prospective sequencing service was started and first detected the ST621 outbreak. A retrospective analysis of historical isolate genomes (2011-2019) identified additional cases. The first paragraph of the discussion lists possible factors to explain why the ST621 escaped detection by traditional approaches. We believe 11 cases per year is not a strong signal when stratified by month, wards, or both, especially for a clone lacking a carbapenemase and without a remarkable antibiotic susceptibility profile.ย 

      Did the infection control personnel suspect transmission? If yes, was the sampling and submission of samples to the MRSN adapted based on the epidemiologic findings?

      The ST621 outbreak was unsuspected before the initial genomic detection in 2020. Until that point, MDR isolates only (Magiorakos et al PMID: 21793988) were collected but compliance was variable through time. Quickly thereafter (starting in 2021), complete sampling of all clinical P. aeruginosa (MDR or not) from Facility A was started. The manuscript was revised to clarify those details of the sampling strategy.

      Is there any information about how many environmental sites were sampled without evidence of ST621 / screening samples were cultured without evidence of Pseudomonas aeruginosa?

      For patient isolates, only 16 isolates were from surveillance swabs. The remaining 237 were clinical isolates. No denominator data was available to calculate P. aeruginosa and ST-621 positivity rate in surveillance swabs throughout the time period. For environmental isolates, a total of 159 swabs were taken from 55 distinct locations in 8 wards/units including the ER. This data is now included in the revised manuscript. However, a complete analysis of these swabs (positivity rate for ESKAPE pathogens, P. aeruginosa, per ward/floor/room, per swab type (sink drain, bed rail etc.) etc.) is beyond the scope of this study and is being performed as a follow up investigation.

      Page 5 lines 89 and 39 Figure S1B. Please describe how the allelic distance for the cluster threshold was selected.

      As indicated in the legend of Figure S1B, no thresholds were applied. All ST621 isolates ever sequenced by the MRSN were included. All except 3 isolates shared between 023 cgMLST allelic differences. The remaining 3 were distant by 88-89 allelic differences. The text was revised to clarify this point.

      Page 5 lines 99-100. Could the authors please provide some distribution measures (e.g. IQR).

      Done as requested. The revised manuscript now reads โ€œโ€ฆof just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. 1A, Table S1).โ€

      Page 5 line 102. Could the authors please provide some distribution measures (e.g. IQR).

      Please see above. A chart was created and is now included as Fig. S2.

      Page 6 line 107 and page 34 figure 1c. In the text it is stated that isolates were collected in 27 wards, the figure 1C depicts 26 wards and n/a.

      Thank you for spotting this inconsistency. This has been fixed in the revised manuscript.

      Page 6 lines 117-118. Samples collected in the emergency room would imply samples collected on admission, already addressed previously. Did the authors investigate a potential import into the hospital from community reservoirs or were all these isolates collected among patients who had been previously admitted to the hospital and/or tested positive for the outbreak strain?

      We agree that samples collected in the ER imply samples collected on admission. Of the 29 ER isolates only 9 (31%) were primary isolates (first detection in a new patient) which suggests a majority were from returning patients at Facility A. Because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) to investigate/confirm that these 9 patients had previous visits at Facility A. This point is now discussed in the revised manuscript.

      Page 6 line 128. This could also represent increased selective pressure. However, according to Table S1, the 28 isolates collected in 2011 (the number does not match with Figure 1D) were from many different wards, thus indicating earlier spread throughout the hospital.

      Yes, we agree. Please note that table S1 lists all isolates for 2011 whereas Figure 1D focuses on primary (first isolate from each patients) only. ย 

      Page 7 line 133. Both Figure 2 and the discussion section, page 13 line 296 suggest the year 2005 instead of 2004?

      Thank you for catching this typographical error. This was corrected to 2004 in the revised manuscript.

      Figure 1E. The figure should also depict intra-patient diversity for comparison.

      Thank you for this great suggestion. We have revised Figure 1E accordingly.

      Page 7, lines 146-147 Could the authors attempt explaining the upper part of the bimodal peaks?

      This is an all-vs-all SNP analysis for all inter-patient isolates. For each isolates all distances to other isolates are reported, not only the smallest. The upper peaks represent comparisons to isolates from a different outbreak subclone (SC1 vs SC2).

      Page 7, line 150 This is a very small number considering the extent of the outbreak and suggests a large number of missing links. Or does this rather imply continuous import and evolution over time that does not necessarily represent transmission within the hospital?

      We believe all cases were due to transmission happening within the hospital. Based on conservative thresholds (genetic relatedness and epi link, or lack thereof) the precise origin from another patient (n=10) or a contaminated surface (n=12) can be inferred. For the remaining 60 patients, with the available sampling, the conditions we chose are not met and we simply do not conclude whether a direct patient-to-patient or an environmental origin was more likely.

      Page 8 line 155. What does the temporal overlap refer to - sampling date versus patient's stay on the ward? Please specify.

      The temporal overlap was investigated from sampling dates, as dates of patient admission/discharged were not available.

      Page 8, line 157: What does primary/serial isolate mean - first and follow-up samples of ST621 per patient?

      Yes. Primary isolate is used to designate the first isolate from a patient. Serial isolates designate follow-up samples of ST621.

      Page 8 line 165: Table S3 and Figure 3 only refer to environmental samples from three wards. Ward 20 rooms 2 and 18 as well as ward 1 rooms 1 and 6 were hotspots - is there any information on the specific infection control/disinfection measures? Addressed in discussion page 12, lines 273-275, but no information on what was actually done.

      The manuscript was revised to indicate the precise disinfection measures that were taken. A follow-up study is ongoing to assess long-term efficacy and monitor possible retrograde growth from previously contaminated sinks.

      Page 8 line 175: Evaluation of change in resistance fraction over time - There may have been a selection bias with an inconsistent number of strains sequenced per year.

      Yes, incomplete sampling and possible selection bias are now listed with other limitations of this study in the discussion of the revised manuscript.

      Page 9 line 183: The referral to Table S1 is unclear, I could not find the number and the specific isolates selected for long-read sequencing.

      Thank you. This has been added to the revised Table S1.

      Page 10 lines 217-225 and Figure 4C: Perhaps it is possible to better align what is written in the text and the caption of the figure. The caption does not clarify that only one patient develops colistin resistance (what was the reason to include the other patients?).

      Thank you. We have revised the text and the caption of the figure to clarify that only isolates from one patient developed colistin resistance. The isolates from the other patients on Fig. 4C are shown to provide context and accurately map the emergence of the PhoQE77fs mutation. ย 

      Page 10, lines 228-229 and Table S5: How is it possible to identify those 64 genes in Table S5?

      We have revised Table S5 to facilitate the identification of the 64 genes with โ‰ฅ 2 independently acquired mutations (excluding SYN). Specifically, we have added column E labeled โ€œCounts independent mutations per locus (excluding SYN)โ€. A total of 205 rows (in this table each row is a variant) have a value โ‰ฅ 2 and these represent 64 genes (upon deduplication of locus tags). ย 

      Page 13, lines 280-281: Where is the information on chronic infection presented? Serial cultures would not necessarily mean chronic infection.

      Authors response: Yes, we agree this was not the appropriate characterization and this was revised to โ€˜long-termโ€™ infections.

      Page 14 line 306: Emergence of colistin resistance in a single patient, correct?

      Yes. This was further clarified in the text.

      Page 14 lines 315-320: This should go to the results section. In particular disinfection, closing, and replacing of tubing should be mentioned in the results section in reference to the results presented in Table S3.

      Thank you. We have considered this suggestion and have decided to leave this discussion as the closing paragraph of this publication. A follow-up study is ongoing to assess long-term efficacy of these interventions on the ST-621 bur also other outbreak clones at Facility A.

      Methods

      Page 15 lines 330-333: Perhaps it is possible to avoid redundancy.

      Thank you. We have revised the text accordingly.

      Page 15 lines 341: Information on which isolates were subjected to long-read sequencing is missing.

      Thank you. This has been added to the revised Table S1.

      Page 16 line 345: Was there a particular reason why Newbler was chosen?

      No. At the time Newbler was the default assembler built in the MRSN bacterial genome analysis pipeline and QC processes.

      Page 16, line 357-358: What was the rationale for selecting this isolate as reference genome?

      This isolate was chosen because it was collected early in the outbreak and phylogenetic analysis revealed it had low root to tip divergence.

      Page 16 line 361: Why 310 isolates, if only 253 were assigned to the outbreak clone and only a subset of those were collected in facility A?

      This was a typographical error that has corrected (it now reads โ€œโ€ฆset of 253 isolates.โ€) in the revised manuscript. ย 

      Page 17 lines 387-395: What is the reason that intra-patient diversity was not included in the set of criteria for SNP distances?

      The observed within host variability (now displayed in revised Fig. 1E) was taken into consideration when setting SNP thresholds for categorizing patient-to-patient transmission or environment-to-patient event. This is now clarified in the revised manuscript.

      Page 17 line 392: How was the threshold of <=10 SNPs determined?

      The 10 SNP cutoff to infer a patient-to-patient transmission event was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study, and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward within the same month) needed to be established.

      Page 17 line 395 and Figure 2: What was the assumed average mutation rate per genome per year?

      Thank you. The mean substitution rate inferred by BEAST was 2.987E-7 similar to estimate from previous studies on P. aeruginosa outbreaks (e.g. PMID: 24039595).

      Reviewer #3 (Recommendations For The Authors):

      Please find (line-by-line comments) on each section of the manuscript below:

      Introduction

      Line 86: I am wondering why the authors state ">28 facilities" instead of the exact number of facilities from which these lineages were recovered.

      Thank you. Manuscript was revised to provide the exact number of facilities. It now reads โ€œโ€ฆrecovered from 37 and 28 facilities, respectively.โ€

      Methods

      It's not clear to me which criteria were used for collecting these isolates (both prospective and retrospective). I understand that some of the data are described in more detail in Lebreton et al but I did not find the specific criteria for the collection of the isolates and I imagine that these might differ if different facilities. Would it be possible to comment on that and add a short paragraph in the Methods section?

      Thank you. This lack of clarity was also raised by other reviewers, and we have revised the manuscript to indicate that: 1/MDR isolates only (Magiorakos et al PMID: 21793988) were collected from 2011-2020 with the same criteria for all facilities although compliance was variable through time and between facilities; and 2/ starting in 2021 all P. aeruginosa isolates, irrespective of their susceptibility profile, were collected from Facility A

      The data comes from a US Military hospital. Is this related to the US Veterans Affairs Healthcare system? Is there more detailed information about the demographics of the patient population?

      Facility A is part of the Military Health System (MHS) which provides care for active service members and their families. This is distinct from the US Veterans Affairs Healthcare system. Only limited patient data was accessible to us as this study was done as part of our public health surveillance activities. Patient age (avg. 57.2 +/- 21.0) and gender (ratio male/female 1.7) are provided in the revised manuscript.ย 

      Line 384ff: The origin of infection was inferred based on the SNP threshold and epidemiological links. However, recombination events can complicate the interpretation of SNP data. Have the authors attempted to account for this?

      Thank you. We agree that recombination events can complicate the interpretation of SNP data. We used Gubbins v2.3.1 to filter out recombination from the core SNP alignment, as indicated in the revised manuscript.

      The authors' definition of environment-to-patient transmission seems conservative (nearly identical strain and no known temporal overlap for > 365 days). Have the authors changed the threshold, performed sensitivity analyses, and tested how this would affect their results?

      Indeed, acknowledging that fixed thresholds have limitations in their ability to accurately predict the origin of infections, we took a conservative approach to favor specificity as our goal was simply to establish that cases of environment-to-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original predictions. This limitation is now discussed in the revised manuscript.

      The authors don't seem to incorporate the role of healthcare workers in the transmission process. Could they comment on this? I am assuming that environment-to-patient transmission could either be directly from the environment to the patient or via a healthcare worker. I think it's fine to make simplifying assumptions here but it would be great if this was explicitly described.

      Thank you for this suggestion. We have not sampled the hands of healthcare workers in this study. As a result, the reviewer is correct to say that we made the simplifying assumption that healthcare workers would be possible intermediates in either environment-topatient or patient-to-patient transmissions, as previously described by others (PMID: 8452949). This limitation is now discussed in the revised manuscript.

      Page 5, line 100: What does "all vs all" mean? Based on the supplement, I assume it's the pairwise distance and then averaged across all of those. It would improve the readability of the manuscript if the authors could briefly define this term and then maybe refer to Table S1.

      Thank you. We have created Fig.S2 and revised the manuscript to state that ST-621 isolates from facility A belonged to the same outbreak clone with a distance (averaged all vs all pairwise comparison) of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. S2, Table S1).

      Figure 1D: It would be interesting to see additional figures in the supplement on the percentage of sequenced isolates per year and whether it varies across the different sources/sites. Is there any information on which isolates were chosen for sequencing?

      Lack of clarity in the sampling/sequencing scheme was raised by multiple reviewers and we have provided a thorough response to earlier comments. We also have revised the material and methods section accordingly. Finally, we have created Fig. S3 to show the percentage of sequenced isolates per year across different sources/sites, as suggested by the reviewer. No noticeable patterns were observed.ย 

      It seems like only a subset of all clinical isolates were sequenced. Would it be possible that SC2 was present already earlier but not picked up until a certain date?

      Although all isolates received by the MRSN were sequenced, compliance varied through time so it is true that not all clinical isolates were sequenced between 2011-2019. As such, we fully agree with this hypothesis and discuss this possibility as BEAST analysis placed the origin of SC2 in 2004 while the first detection of an SC2 isolate was in December 2012. This limitation is now discussed in the revised manuscript.

      Could the authors elaborate on whether the isolates resulted from single-colony picks? Is it possible that the different absence of a subclone is due to the fact that they picked only a colony?

      Yes, the isolates resulted from single-colony picks except when the presence of different colony morphologies was noted. In the latter, representative isolates for each colony morphologies were processed. We have revised the methods to make that clear.

      Figure 2: It is difficult to see which nodes belong to which patient due to the small font size. I wonder if it was possible to color the nodes for each patient, to make it more readable.

      We tried coloring the nodes but with > 60 distinct patients/colors we decided it did not improve clarity. We have revised figure 2 to increase the font size. ย 

      Page 7-8, lines 154-155: Did the authors check whether there were isolates of the same strain (that were found in the environment) present in other patients elsewhere in the ward?

      Yes. In rare cases, we observed virtually genetically identical isolates from two patients collected in different wards. Because we only have access to clinical isolate data (collected from patient X in ward Y) and do not have access to patient data (admission/discharge date, wards, rooms, etc.), we do not know but cannot exclude that patients overlap in a room prior to the sampling of their P. aeruginosa isolates. We designed our fixed thresholds to be conservative. As a result, in this analysis, these cases are labelled as โ€œundeterminedโ€. ย 

      Page 8: Do the authors have any information on antibiotic use during this timeframe? From the discussion, it seems like there is no patient-level prescription data. Is there any data on overall trends? How were trends in antibiotic use correlated with trends in antibiotic resistance?

      Unfortunately, patient-level prescription data (or any other data not linked to the bacterial specimens) was not accessible to us as this study was done as part of our public health surveillance activities.

      To infer the origin of infection, the authors used a static method with fixed thresholds and definitions. This study does not provide any uncertainty with their estimates. Maybe the authors could add a sentence in the discussion section that MCMC methods to infer transmission trees incorporating WGS could provide these estimates. These methods have not been applied to PA a lot but two examples where MCMC methods have been used without WGS (though the definition of environmental contamination may differ between these studies and this study).

      https://doi.org/10.1186/s13756-022-01095-x

      https://doi.org/10.1371/journal.pcbi.1006697

      Thank you for this great suggestion. We have revised the manuscript to include a discussion on the limitations of fixed thresholds to infer transmission chains/origins, and to discuss existing alternatives including MCMC methods.ย 

      Line 322-323: This sentence is a bit vague since not all of these HAI are due to P. aeruginosa. I would suggest citing a number that is specific to PA.

      Thank you. While our paper shows a particular example of protracted P. aeruginosa outbreak, the roll-out of routine WGS surveillance in the clinic will help prevent hospital-associated drug-resistant infections for more than this species. We believe that broadening the scope in the last sentence of the manuscript is important and we decline to revise as suggested.

    1. Good night, ladies, good night, sweet ladies, good night, good night.

      It's interesting to me how Eliot ends this section of The Waste Land with Ophelia's last words before she commits suicide. Lines before, we get references to "Bill," "Lou," and "May," indicating that the speaker is bidding farewell from the pub setting. Ophelia's line, on the other hand, bids farewell on behalf of not just Lil and the woman in the pub, but all the "sweet ladies" of the waste land. This idea of death as a fate is super interesting. The women have their emotional and spiritual deaths connected to Ophelia's physical death. This is yet another instance where we see suicide in a female in The Waste Land. If I think about what Eliot is trying to get at with women x waste land, especially with this Ophelia connection, I'd say the waste land is a world where the modes of expressing experiences like song, symbol, and even madness have been stripped of their meaning and beauty, leaving only bad nerves, dirty gossip, and the last call of the pub. This is obviously not the ideal place for women; hence, modern society is not fit for women to flourish.

    2. THE WASTE LAND

      โ€œItโ€™s just a flesh woundโ€ (Monty Python and the Holy Grail) says the Black Knight as his dismembered limbs lie on the ground next to him. Lucky for him he was not struck by the Sword of the Ship. In Le Morte Dโ€™Arthur, the sword delivers much more than a wound; it functions as a vessel of destruction and infertility, cursing not only the body it strikes but the land under the blow. In Book XVII, Chapter III, the gentlewoman recounts the tale of the Dolorous Stroke: โ€œAnd when King Hurlame saw King Labor he dressed this sword, and smote him upon the helm so hard that he clave him and his horse to the earth with the first stroke of his sword. And it was in the realm of Logris; and so befell great pestilence and great harm to both realms. For sithen increased neither corn, nor grass, nor well-nigh no fruit, nor in the water was no fish; wherefore men call it the lands of the two marches, the waste land, for that dolorous stroke.โ€ In The Waste Land, Eliot positions himself as the sword: a cursed destroyer whose stroke leaves the modern world fragmented, barren, and nearly speechless. Just as the swordโ€™s blow leaves both the king and his land wounded, Eliotโ€™s poem creates a Waste Land that is not only destroyed but also destructive. Malory makes clear that destruction does not remain contained within the body of the king. The Dolorous Stroke radiates outward, making the land itself sterile and dangerous. Jessie Weston underscores this paradox, insisting, โ€œthe condition of the King is sympathetically reflected on the land, the loss of virility in one brings about a suspension of the reproductive processes of Nature on the other.โ€ What is destroyed becomes the destroyer. The kingโ€™s wound causes agricultural collapse, famine, and sterility. James Frazer situated this paradox within the broader mythic cycle of dying gods such as Tammuz or Adonis: โ€œDuring her absence the passion of love ceased to operate: men and beasts alike forgot to reproduce their kinds: all life was threatened with extinction.โ€ In each case, destruction does not stop with the figure who receives the wound; the wound itself spreads, making the world an active force of devastation. Eliot translates this ritual pattern into modernism. By casting himself as the sword, he is not the Grail-bearer offering renewal but the weapon that fractures language and tradition. His Waste Land is the blasted cultural landscape after the stroke, both destroyed by war and disillusionment and destructive in its ongoing sterility and fragmentation. The poem wounds and bears wounds simultaneously. Like the Sword of the Ship, The Waste Land is a cursed vessel of destruction. It strikes, it scars, and it leaves behind a world that continues to unravel under the weight of its blow. The poem is destroying poetic precedent, and in the process, destroying itself.

    1. Your research question should inform the structure and contents of your project and everything you cite should be related to your research question in some way.

      This line is the bookโ€™s quiet thesis about research writing: the question is not just a starting prompt; itโ€™s the organizing principle. Read โ€œinformโ€ as constrain and shape. If a subsection, paragraph, or citation doesnโ€™t help answer the question youโ€™ve posed, itโ€™s ornamentalโ€”cut it. Practically, this means (1) operationalizing your terms (What exactly counts as โ€œpsychological well-beingโ€? Which population, time frame, and context?), (2) reverse-outlining your draft to check that every section maps to a sub-task of the question (define, contextualize, test, interpret), and (3) applying a ruthless relevance test to sources: each should either supply evidence, methods, or counter-arguments that bear directly on the claim your question implies. This alignment prevents the two most common failures in student research: the โ€œdata dumpโ€ (too many unfocused sources) and the โ€œtourโ€ (interesting but aimless background). A strong question automatically yields a coherent structure because it dictates what must be established, measured, compared, or explainedโ€”and in what order. Quick check: write your research question atop your draft; under every paragraph, jot the specific part of the question it answers. Anything blank signals a tangent.

    1. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors describe the results of a high-throughput screen for small-molecule activators of GCN2. Ultimately, they find 3 promising compounds. One of these three, compound 20 (C20), is of the most interest both for its potency and specificity. The major new finding is that this molecule appears to activate GCN2 independent of GCN1, which suggests that it works by a potentially novel mechanism. Biochemical analysis suggests that each binds in the ATP-binding pocket of GCN2, and that at least in vitro, C20 is a potent agonist. Structural modeling provides insight into how the three compounds might dock in the pocket and generates testable hypotheses as to why C20 perhaps acts through a different mechanism than other molecules.

      Strengths:

      Of the 3 compounds identified by the authors, C20 is the most interesting, not just for its intriguing mechanistic distinction as being GCN1-independent (shown genetically in two distinct cell lines, CHO and 293T in Figure 4, and in contrast to other GCN2 activators) but also for its potency. In in-cellulo assays, compound 21 appears as more of an ISR enhancer than an activator per se, and although compound 18 and compound 21 lead to upregulation of the ISR targets (Figure 2), that degree of upregulation is probably not significantly different from that induced by those compounds in Gcn2-/- cells. For C20, the effect appears stronger (although it is unclear whether the authors performed statistical analysis comparing the two genotypes in Figure 2D). In Figure 3, only C20 activates the ISR robustly in both CHO and 293T. Ultimately, C20 might be a tool for providing mechanistic insight into the details of GCN2 activation and regulation, and could be exploited therapeutically.

      Weaknesses:

      There are some limitations to the existing work. As the authors acknowledge, they do not use any of the compounds in animals; their in vivo efficacy, toxicity, and pharmacokinetics are unknown. But even in the context of the in cellulo experiments, it is puzzling that none of the three compounds, including C20, has any effects in HeLa cells when Neratinib does. It's beyond the scope of this paper to address definitively why that is, but it would at least be reassuring to know that C20 activates the ISR in a wider range of cells, including ideally some primary, non-immortalized cells. In addition, the ISR is a complex, feedback-regulated response whose output varies depending on the time point examined. The in cellulo analysis in this paper is limited to reporter assays at 18 hours and qRT-PCR assays at 4 and 8 hours. A more extensive examination of the behavior of the relevant ISR mRNAs and proteins (eIF2, ATF4, CHOP, cell viability, etc.) for C20 across a more extensive time course would give the reader a clearer sense of how this molecule affects ISR output. I also find it a bit strange that the authors describe C20 as "demonstrat(ing) weak inhibition of ... PKR"-the measured IC50 is ~4 ฮผM, which is right around its EC50 for GCN2 activation. This raises the confounding possibility that C20 would simultaneously activate GCN2 while inhibiting PKR. While perhaps inhibition of PKR is not relevant under the conditions when GCN2 would be activated either experimentally or therapeutically, examining in cells the effects of C20 on GCN2 and PKR across a dose range would shed light on whether this cross-reactivity is likely to be of concern.

    1. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published conclusion by Mattias Siljestam and Claus Rueffler (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand even a very slight change in ecological parameters. Second, a modified model that allows an expansion of the MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to readers if the conclusions are valid and non-trivial.

      Let me first comment on the second part of the manuscript that describes the fitness advantage of the 'gene family expansion'. I think this, by itself, is a totally predictable result. It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative. Yet, as I understood the narrative of the manuscript, the expansion of the gene family serves as a mere counter-example to the disputed finding of [SR], rather than a systematic study of the eco-evolutionary consequences of this process.

      Now to the first part of the manuscript, which claims that the point made in [RS] is not robust and breaks down under a small change in the parameters. An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text. The only piece of information given in the manuscript is that, unlike in [SR], the adjustable parameter c_{max} is kept constant when the number of pathogens is changed.

      In my opinion, the information provided in the manuscript does not allow one to conclude anything about the relevance and the validity of its main claim. At the same time, the simulations done in [SR] are described with a fair amount of detail. Which allows me to assume that the conclusions made in [SR] are fairly robust and, in particular, have been demonstrated not to be too sensitive to changes in the main "suspect', c_{max}. Let me briefly justify my point.

      First, it follows from Eqs (4,5) in the main text and (A12-A13) in the Appendix that c_{max} and K do not independently affect the dynamics of the model, but it's rather their ratio K/c_max that matters. It can be seen by dividing the numerator and denominator of (5) by c_max. Figure 3 shows the persistent branching for 4 values of K that cover 4 decades. As it appears from the schemes in the top row of Figure 3, those simulations are done for the same positions and widths/virulences of pathogens. So the position of x* should be the same in all 4 cases, presumably being at the center of pathogens, (x*,x*) = (0,0). According to the definition of x* given in the Appendix after Eqs (A12-A13), this means that c_max remains the same in all 4 cases. So one can interpret the 4 scenarios shown in Figure 3 as corresponding not to various K, but to various c_max that varied inversely to K. That is, the results would have been identical to those shown in Figure 3 if K were kept constant and c_max were multiplied by 0.1, 1, 10, and 100, or scaled as 1/K. This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      Naturally, most, if not all, the dynamics will break down if one of the ecological characteristics changes by a factor of 10^43, as it is reported in the submitted manuscript. As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c. In [SP], it is clearly shown where the pathogens are.

      Another argument that makes me suspicious in the utility of the conclusions made in the manuscript and plays for the validity of [SP] is the adaptive dynamics derivation of the branching conditions. It is confirmed by numerics with sufficient accuracy, and as it stands in its simple form of the inequality between two widths, the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

    1. Analyses across diverse commercially-relevant protein families such as recombinases, hydrolases, and ATP synthases demonstrate that BaseData consistently captures more sequence-level and phylogenetic diversity than any existing dataset, finding more potential starting points for biological functions important for the development of new therapeutics and industrial solutions

      It's hard to parse how much phylogenetic diversity BaseData actually adds (and how it's distributed) with just these plots as evidence. Quantification would be useful.

      What proportion of phylogenetic diversity in GTDB/OMG is captured? How much new phylogenetic diversity is contributed? How many novel clades without GTDB/OMG ancestry are added? Crucially, how are these patterns distributed within BaseData? Are phylogenetic diversity gains evenly distributed over proteins/genes?

    1. lesson planning can seem overwhelming and laborious

      This is about how I feel about lesson planning and I haven't even done a lot of it yet. I know depending on the district determines if it's something you have to put together and submit every week but it just feels a bit outdated because of how fluid learning is. If the class is behind on something or not understanding, I can't just say "sorry" and move on because the lesson plan said so. I'll have to make adjustments and I know that there's nothing that says you can't do that it just feels like, in my brain, once you write something down and "plan it out" that is what has to happen or else it's some horrible misdeed.

    1. Misbehaviors left alone can be contagious, a process educators sometimes call the ripple effect

      I like the term "the ripple effect" because it's very true. I actually witnessed it a bit today, in one of the bigger classes I was teaching today there's a group of boys who sit in the back of the classroom and they started to get chatty amongst their own group and it started to spread outwards until my CT put a stop to it. I'm still trying to get comfortable taking control myself, but it's hard to feel like I have the ability to take that authority from my CT. It is through no fault of his it's just getting into that mindset myself in his classroom.

    2. Good & Brophy (2008) maintain that praise does not work as a positive reinforcement as well with adolescents as with primary-aged students.

      I'm not sure that I agree with that statement. Even as an adult, I enjoy hearing that my efforts are recognized and it does help push me to keep being successful. Granted, it may not work for everyone, but I have witnessed it working in several cases and to varying degrees (including teenagers and adults in the workforce). I can't tell you how many times I've heard the saying "It's nice to just feel appreciated".

    1. Now, the word has a kind of double meaning, even in itscommon-sense understanding. It does mean โ€œto present,โ€ โ€œto image,โ€ โ€˜todepictโ€ โ€“ to offer a depiction of something else. And the word representation orrepresentation does sort of carry with it the notion that something was therealready and, through the media, has been represented.

      The media will offer us one of two things a picture or deceptive Lie, sometimes it's a lie, or it's something that already exists. And the media just comes out and acts like they didn't know that it already existed in the real world.

    1. Group G Ben Braniff, Kim Maynard, Nick Devic, Maria Echeverri Solis, Sam Yalda

      1. Design has a major impact on the world and society. Even the little things can add up to a lot. Sustainability is a revolutionary Idea that should be at the core of every design now.

      2. Society is another bottom line meaning all design inherently affects humans and/or is designed for humans. It's important to design for the extremes and the edge cases like people with disabilities.

      3. Corporations output a lot of waste. When they make small changes to be more sustainable, it results in big changes and saving a lot of material. Small changes can include anything from using 2% less plastic per water bottle to using wood buttons instead of plastic ones.

      4. A lot of people don't consider themselves disabled, but it's very common at some point in people's lives to have a certain level of impairment. It's important to keep this in mind when designing as you're designing for the general population--not just a specific individual.

      5. Addressing issues like world hunger may require rethinking the way we design food production. As they stated for example, choosing kangaroo meat over beef as a more environmentally sustainable option.

      6. Thoughtful design choices per the example in the video such as adding white circles inside letters to reduce ink use, can improve efficiency and conserve resources.

      7. It is interesting how he opens up his discussion to slowly introduce that design isn't just about doing it for marketing or 'profit' as he pointed out. When watching this it helps a person realize that design is so much more powerful than that if you put it towards another cause. Design could end up being the solution to some of the biggest problems in society.

      8. A very important point he made was that improving accessibility is beneficial to many more people than just the people that initially needed it such as people with disabilities. From this i think a good takeaway is that design should always be considerate of any disabilities/needs that the audience might have because sometimes that design is just better for everyone in general.

      9. My first take is design should go beyond money and aesthetics. By thinking about sustainability and accessibility the designers can create solutions that are socially responsible and environmentally friendly.

      10. My second take is when you design with people with disabilities you end up with solutions that are more usable and inclusive

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Xu et al. investigated split gene drive systems by targeting multiple female essential genes involved in fertility and viability in Drosophila. The authors evaluate the suppression efficiency through individual corsses and cage trials. Resistance allele formation and fitness costs are explored by examining the sterility and fertility of each line. Overall, the experimental design is sound and methods are feasible. The work is comprehensive, and conclusions are well supported by the data. This work offers informative insights that could guide the design of suppression gene drive systems in other invasive disease vectors or agricultural pests.

      However, several points requiring clarification or improvement:

      1 Methodological clarity: Some experimental details are indufficiently described, for example, regarding the setup of genetic crosses involving different Cas9 derivatives. In line 197-198, "the mated females, together with females that were mated with Cas9 only males", it is unclear whether the latter group refers to gRNA-females.

      -We thank the reviewer for pointing out this ambiguity. The latter group refers to Cas9 females crossed to Cas9 males. We have clarified this both in the methods (line 207) and results (line 505-509).

      2.Regarding the inheritance rates, you included the reverse orientation of CG4415-Cas9, as I understood, it means this component is in reverse orientation with fluorescent marker. Since it is standard to design adjacent components in opposite direction to avoid transcriptional interference, the rationale for including this comparison should be better justified.

      • In our construct, โ€˜CG4415 (reverse orientation)โ€™ indicates that Cas9 was oriented in the same direction as the fluorescent marker, while the other Cas9 constructs (nanos-Cas9 and CG4415-Cas9) places them in opposite directions. โ€œreverseโ€ just indicates a change from a โ€œstandardโ€ in another study. Our previous publication showed that Cas9 orientation relative to the marker had little apparent effect on drive performance at the yellow-G locus. In this study, we compared both orientations in a fertility gene and again observed similar results, suggesting that orientation relative to the marker does not substantially affect drive efficiency in our system. We have clarified this in the figure legend text.

      Embryo resistance is inferred from the percentage of sterile drive females derived from drive mothers. How many female individuals were analysed per line and why deep sequencing was not employed to directly detect resistance alleles.

      -Embryo resistance can mean slightly different things for different applications. The most important is probably the fraction of females that have little to no fertility due to embryo resistance. Some of these may not have complete embryo resistance alleles, but instead, have mosaicism, with a sufficient level of resistance to still cause sterility. It is unclear exactly what proportion of resistance to wild-type may cause this, and thus, proportions from pooled sequencing, which could include both complete and all levels of mosaicism, may not be sufficient to measure this parameter. Another relevant parameter that we did not measure is the fraction of males rendered unable to do drive conversion (this value should be closer to the complete resistance rate, but probably still lower because of the multiple gRNAs). Even in this case, deep sequencing would not allow us to determine exactly what is happening in males, making individual sequencing a preferred approached. It is very nice, of course, for characterizing which resistance alleles are present overall, but in this study, we wanted to put a bit more emphasis on the effect of resistance, rather than its sequence characterizing.

      We analyzed 30 females per line for lines targeting nox, oct, dec and stl, 9 females for ndl and 276 individuals for line tra-v2 (Data Set S4). We believe such individual analyses sufficiently detected embryo resistance causing sterility within reasonable error. Note that we did also randomly genotype several sterile females and found mutations at target sites that disrupted gene functions.

      In response to this comment, we have added some text to justify our measurement of resistance alleles and include some of this discussion:

      โ€œNote also that this defines embryo resistance as sufficient to induce sterility, but these may be mosaic rather than complete resistance. Further, note that the multiplex gRNA design in males may allow for continued drive conversion with a complete (as opposed to mosaic) embryo resistance allele, if some sites remain wild-type.โ€

      Masculinisation phenotypes were observed upon disruption of tra gene. How strong intersexes were distinguished from males? What molecular markers were used to determine genetic sex. This information should be clearly provided.

      -We observed two types of strong masculinisation phenotypes (Figure S2), one with bigger body size than wildtype males, and the other was identical to wildtype males. The homozygosity of the drive allele could be assessed by the brightness of red fluorescence in the eyes. However, we also randomly genotyped these masculinized females (as part of a batch that included males) to confirm their sex using primers for the Y-linked gene PP1Y2. A specific band was detected in wild-type males but not in masculinized females, confirming their genetic sex. This information has been added to the manuscript (lines 477-480).

      It would be more appropriate to use "hatchability"rather than "fertility" when referring to egg-to-larva viability.

      -Thank you for the suggestion. We used egg-to-adult survival rates as a proxy for the fertility of their parents because they usually laid similar number of eggs. However, it still technically incorrect language. We have fixed this in line 582 and elsewhere in the section.

      In cage trials, a complete gene drive is mimicked by introducing Cas9 to the background population, but this differs from actual complete gene drive, due to potential effects from separate insertion sites (different chromosome or loci). These difference could impact the system's performance and should be discussed.

      -We appreciate this point and have added discussion on the limitations of mimicking a complete gene drive using split components (line 766-779).

      7.Given the large amount of data presented, it would improve readability and interpretation if each result section concluded with a concise summary highlighting the key findings and implications.

      -Thank you for the suggestion. We have added brief summaries at the end of each results section to highlight the key findings and their significance.

      Reviewer #1 (Significance (Required)):

      The authors evaluate suppression efficiency through individual courses and cage trials. Resistance allele formation and fitness costs are explored by examining the sterility and fertility of each line. Overall, the experimental design is sound and methods are feasible. The work is comprehensive, and conclusions are well supported by the data. This work offers informative insights that could guide the design of suppression gene drive systems in other invasive disease vectors or agricultural pests.

      -We appreciate the reviewerโ€™s positive assessment of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Paper summary

      The manuscript by Xu. et al presents an insightful and valuable contribution to the field of gene drive research. The manuscript by Xu et al. presents an insightful and valuable contribution to the field of gene drive research. The strategy of targeting and disrupting female fertility genes using selfish homing genetic elements was first proposed by Burt in 2003. However, for this approach to be effective, the phenotypic constraints associated with gene disruption have meant that the pool of suitable target genes remains relatively small - notwithstanding the significant expansion in accessible targets enabled by CRISPR-based genome editing nucleases. Population suppression gene drives are well developed as proof-of-principle systems, with some now in the late stages of development as genetic control strains. However, advancing the pipeline will require a broader set of validated target genes - both to ensure effectiveness across diverse species and to build redundancy into control strategies, reducing reliance on any single genetic target.

      In their paper, the authors conduct a systematic review of nine female fertility genes in Drosophila melanogaster to assess their potential as targets for homing-based suppression gene drives. The authors first conduct a thorough bioinformatic review to select candidate target genes before empirically testing candidates through microinjection and subsequent in vivo analyses of drive efficiency, population dynamics, and fitness costs relating to fecundity and fertility. After finalising their results, the authors identify two promising candidate target genes - oct and stl - which both demonstrate high gene conversion rates and, regarding the latter, can successfully suppress a cage population at a high release frequency. However, the manuscript suffers from a lack of in-depth discussion of a key limitation in its experimental design - namely, that the authors utilise a split-drive design to assess population dynamics and fitness effects when such a drive will not reflect release scenarios in the field. The review below highlights some major strengths and weaknesses of the paper, with suggestions for improvement.

      Key strengths

      The study's most significant strength is in its systematic selection and empirical testing of nine distinct genes as targets for homing-based gene drive, hence providing a valuable resource that substantially expands the pool of potential targets beyond the more commonly studied target genes (e.g. nudel, doublesex, among others). The identification of suitable target genes presents a significant bottleneck in the development of gene drives and the work presented here provides a foundational dataset for future research. The authors bolster the utility of their results by assessing the conservation of candidate genes across a range of pest species, suggesting the potential for broader application.

      A key finding in the paper is the successful suppression of a cage population using a stl-targeting gene drive (albeit at a high release frequency). This provides a critical proof-of-principal result demonstrating that stl is a viable target for a suppression drive. While in the paper suppression was not possible at lower release frequencies, together, the results provide evidence for complex population dynamics and threshold effects that may govern the success or failure of a gene drive release strategy - hence moving the conversation from a technical perspective ("can it work") to how a gene drive may be implemented. Moreover, the authors also employ a multiplexed gRNA strategy for all their gene drive designs and in particular their population suppressive gene drive targeting stl. This provides further proof-of-principal evidence for multiplexed gRNAs in order to combat the evolution of functional resistance following gene drive deployment.

      Finally, a further strength of this paper is in the clever dissection of fitness effects resulting from maternal Cas9 deposition. The authors design and perform a robust set of crosses to elucidate the parental source of fitness effects (i.e. maternally, paternally, or biparentally derived Cas9), finding (as they and others have before) that embryonic fitness was significantly reduced when Cas9 was inherited from a maternal source. As discussed, the authors conclude that maternal deposition is particularly pronounced in the context of split drives as opposed to complete drives, with the implication being that a complete drive might succeed where a split-drive has failed; thus providing a key directive for future study.

      Concerns

      The manuscript's central weakness lies in its interpretation of the results from the cage experiments - namely that a split-drive system was used to "mimic the release of a complete drive". In the study, mosquitoes carrying the drive element (i.e. the gRNA) were introduced into a population homozygous for the Cas9 element over several generations. This design is likely not representative of a real-world scenario and, as the authors state, likely exaggerates fitness costs. This is because the females carrying Cas9 will maternally deposit Cas9 protein into her eggs, with activity spanning several generations. When mated with a drive-carrying male the gRNA will immediately co-exist with maternally deposited Cas9, leading to early somatic cleavage and significant fitness costs (reflected in the author's own fertility crosses). This is fundamentally different to how a complete drive would function in a real-world release, where complete-drive males would mate with wild-type females not carrying Cas9. Their offspring would carry the drive element but would not be exposed to maternally deposited cas9, thus deleterious maternal effects would only begin to appear in the subsequent generation from females carrying the drive. Fitness costs measured from split-drive designs are therefore likely substantially overestimated compared to what would occur during the initial but critical release phase of a complete drive. This flaw weakens the paper's ability to predict the failure or success of the screened targets in a complete drive design, thus weakening the interpretation of the results from the cage trials. As a suggestion for improvement, the authors should explicitly and more prominently discuss the limitations of their split-drive model compared to complete drive models, both in the Results and Discussion. It is also recommended to include a schematic for both strategies that contrasts the experimental setup design (i.e. release of the drive into a Cas9 homozygous background) with a complete-drive release, clearly illustrating differences in maternal deposition pathways. This will not only contextualise the results and support the author's conclusion that observed fitness costs are likely an overestimate but will further strengthen the arguments that the candidate target genes found in this study may still be viable in a complete-drive system.

      -We sincerely appreciate the thoughtful review and the valuable comments and suggestions provided, which have helped improve both the clarity and readability of this study. We have revised several parts in the discussion of the manuscript and hope that these changes adequately address the concerns raised. We have also made Figure S5 to illustrate the differences between two release strategies (biparental-Cas9 split drive in our study and complete drive in real release).

      Please note that this type of fitness cost may have partially undermined our cage study (the fitness effect is notable, but still small compared to total fitness costs), but this is also among the first studies to propose and investigate this phenomenon in the first place (it is also noted in another preprint from our lab but to our knowledge not proposed elsewhere). Thus, part of the impact of our manuscript is showing that this is important, which may inform future cage studies in our lab and elsewhere.

      A second weakness in the manuscript relates to its limited explanation and discussion of key concepts. For example, the manuscript reports a stark difference in outcome of the two stl-targeting drives, where a high initial release in cage 1 led to population elimination versus a failure of the drive to spread in cage 2. The authors attribute this to vague "allele effects" and stochastic factors such as larval competition; however the results appear reminiscent of the Allee effect, which is a well-characterised phenomenon describing the correlation of population size (or density) and individual fitness (or per capita population growth rate). Using their results as an example, is it plausible that the high-frequency initial release in cage 1 imposed enough genetic load to quickly drive the population density below the Allee threshold thus quickly leading to population eradication. In cage 2, the low-frequency at initial release was insufficient to cross the Allee threshold. Omitting mention of this ecological principal greatly weakens the Discussion, and further presents a missed opportunity to discuss one of the more crucial strengths of the paper - that is, in providing a deeper insight into the practical requirements for successful field implementation.

      -While we do indeed mention this Allee effect (the โ€œallele effectโ€ noted above is a misspelling that we have corrected), we were hesitant to give it much discussion, considering that the specific Allee effect in our cages is likely of a very different nature than one would find in nature (we explain that it is likely due to bacterial growth that occurs when fewer larvae are present). However, it is perhaps still a good excuse to cover it in the discussion, while still noting that the specific Allee effect in our cage may not be representative. We have added the following text: โ€œNonetheless, the successful result in the cage with high release study may point to a potential field strategy for a drive that is less efficient (perhaps even one found to be less efficient in initial field tests compared to laboratory tests). If the initial release frequency of the drive is sufficiently high and widespread, then short-term high genetic load may substantially reduce the population, perhaps enough for Allee effects to become important. At this point, even if average genetic load is slowly declining without additional drive releases, persistent moderate genetic load coupled with the Allee effect may be sufficient to ensure population elimination.โ€

      In a similar vein, the authors provide only a superficial mechanistic discussion into the fitness costs associated with drives targeting key candidate genes. The paper would benefit from a deeper discussion regarding the specific molecular functions of top-performing genes (stl, oct, nox) and how unintended Cas9 activity could disrupt their activity, integrating known molecular functions with observed fitness costs. For instance, oct encodes a G-protein coupled receptor essential for ovulation and oviduct muscle relaxation, thus disruption to the oct gene would directly impair egg-laying which would account for the observed phenotypic effects. A deeper discussion linking unintended Cas9 activity to the specific, sensitive functions of target genes would elevate the paper from a descriptive screen to a more insightful mechanistic study.

      -We appreciate the reviewerโ€™s comment. We have added a discussion to further explain fitness cost caused by unintended Cas9 activity disrupting target gene functions. However, keep in mind that the exact timing of Cas9 cleavage and the exact timing of these geneโ€™s essential functions is still somewhat uncertain, which may limit insights from this line of analysis compared to a situation where ideal, high quality data is available for both of these. Here is the new material in the discussion:

      โ€œThe functions of the top-performing genes suggests a mechanistic basis for the observed fitness costs. Aside from germline cells, nanos has expression in other ovary cells as well. CG4415 lacks this expression, but our Cas9 construct with this promoter may have a different expression pattern that the native gene, as evidenced by its support for good drive conversion in females. stl is essential for ovarian follicle development, and its disruption likely in non-germline ovary cells could compromise egg chamber development and fertility. oct encodes the octopamine ฮฒ2 receptor, a G-protein coupled receptor critical for ovulation and fertilization, so if it were similarly lost, egg-laying would be directly impaired. nox, which encodes NADPH oxidase, contributes to calcium flux and smooth muscle contraction during ovulation, so its disruption may prevent egg laying. tra is needed in the whole body for sexual development, but may also play an important role in ovary function. Thus, unintended Cas9 activity at these non-germline ovary cells can directly interfere with sensitive reproductive functions, potentially explaining the fertility costs observed in drive carriers. This issue could potentially be overcome if promoters were available that were truly restricted to germline cells rather than other reproductive cells, though it remains unclear if such promoters both exist and would retain their expression pattern at a non-native locus.โ€

      It is curious that the authors chose two genes on the X chromosome as targets. In insects (such as Drosophila here) that have heterogametic sex chromosomes, homing is not possible in the heterogametic sex as there is no chromosome to home to - so there will be no homing in males. On top of that, there is usually some fitness effect in carrier (heterozygous) females, so in a population these are nearly always bad targets for drives - unless there is some other compelling reason to choose that target?

      -Our rationale for testing X-linked targets is twofold. First, these genes are likely to play important roles in sex-specific functions and may have a different expression pattern (which is why specifically Dec was included), potentially reducing fitness costs. Although homing cannot occur in males, if drive conversion at these sites in females is very high and fitness costs are minimal, the resulting genetic load could still be sufficient to suppress populations (thus, such candidates could be superior even in diploids if they happen to have a lower fitness costs). Second, X-linked targets may have broader relevance for suppression drives in haplodiploid pests (e.g., fire ants), which has the same population dynamics as an X-linked target in a diploid populations. Our results therefore could have provided useful insights for such scenarios (such as for fire ants: Liu et al., bioRxiv 2025) if drive performance was sufficient for followup testing.

      Minor comments

      • Enhanced clarity in the Figures and data presentation would greatly improve readability. For example, Figure 5 is critical yet difficult to interpret; consider changing x-axis labels from icons to explicit text (e.g. "biparental Cas9", "maternal cas9", "paternal Cas9"). Similarly, Figure 4 is difficult to read and the y-axis label "population size" is ambiguous; consider adding shapes or dashes (rather than relying solely on colour) and clarifying the y-axis (e.g. no. adults collected) in the legend.

      -We appreciate the reviewerโ€™s comment and have revised Figure 4 as suggested. Regarding Figure 5, we attempted to replace the icons with text labels; however, this was not possible because there is very little horizontal space and two generations to specify. Instead, we have revised the figure legend to provide a clearer explanation, which can hopefully improve clarity..

      • Expand on or include a schematic to show the differences in construction between the tra-v1 and tra-v2 constructs to better contextualise the discrepancies in results (e.g. inheritance rates of 61%-66% for tra-v1 and 81%-83% for tra-v2 between the two.

      -We have expanded Figure 2 to compare the constructs of tra-v1 and tra-v2. The further explanation of these two constructs was added into the result section: โ€˜When targeting tra, we originally tested the 4-gRNA construct tra-v1. However, the drive inheritance rate was relatively low (61%-66%), and sequencing revealed that only the middle two gRNAs were active (Table S3). Lack of cleavage at the outmost sites is particularly detrimental to achieving high drive conversion. Therefore, a second construct tra-v2 was tested that retained the two active gRNAs and included two new gRNAs. It showed substantially improved drive inheritance (81%-83%). โ€™

      • Minor typos e.g.:

      o Line 87: "form" to "from"

      o Line 484: "expended" to "expanded

      o Line 560: "foor" to "for"

      o Line 732: "conversed" to "conserved

      -We have revised these typos.

      • Clarify the split drive system: the authors introduce split drive for the first time in Line 118. They should at least give a clear definition and explanation of split drive and complete drive in the introduction.

      -We have included an introduction of split drive and complete drive in the introduction (line 47-53).

      • Line 237-238., The fitness evaluation lacks a clear description of controls. How were non-drive flies generated and validated as controls?

      -Drive heterozygotes were crossed with Cas9 homozygotes to generate the flies used for fitness evaluation. From the same cross, non-drive progeny were obtained and used as controls, ensuring they shared a comparable genetic background and rearing conditions with the drive-carrying individuals. We have now clarified in the manuscript results that โ€œthese served as the controls because they had the same environment and parents as the drive fliesโ€.

      • Line 409-412.,line 423.,The high inheritance rates of stl and oct drives are impressive; however, variation in results across Cas9 promoters should be explained further in the discussion.

      -In the discussion section (lines 751-765), we included a dedicated paragraph addressing the variation observed between the nanos and CG4415 promoters. We have now expanded it to briefly note some differences:

      โ€œOur previous works showed that both nanos and CG4415 have high drive conversion rates8, but nanos failed to suppress target populations in a homing drive targeting the female fertility gene yellow-G due to its fitness cost in drive females27. CG4415 had much lower maternal deposition, which allowed the elimination of cage populations by targeting yellow-G8. Here, we tested both promoters with drives targeting oct and stl, with both showing slightly higher drive efficiency than the drive targeting yellow-G in small-scale crosses. CG4415 has slightly worse though still good performance in females, likely due to male-biased expression compared to nanos.โ€

      • Line 414: The CG4415 promoter yielded reduced drive conversion rates in females, yet is still referred to as a promising promoter. This conclusion seems optimistic and should be clarified/more justified.

      -Based on our previous study cited in this context, CG4415 shows relatively lower germline conversion rates compared to nanos, although still remaining at a high level. Importantly, CG4415 also exhibits reduced maternal deposition relative to nanos, which could help mitigate fitness costs associated with maternal depositionโ€”an important consideration for suppression systems. Taken together, while its conversion efficiency is lower (but only slightly), the potential benefits of reduced maternal deposition and perhaps even fitness costs provide a rationale for regarding CG4415 as a promising promoter. We state this when first introducing the promoter in the โ€œDrive efficiency assessmentโ€ results subsection.

      • Specify the number of flies released, sex ratio, and cage size per generation (Line 466). This is essential for reproducibility.

      -We appreciate the reviewerโ€™s comment and have revised the text to clarify our release approach, which differed from that used in other studies (which tend to have substantial fitness differences between lines in the first generation that can complicate analysis and change results). Rather than directly releasing drive males or females into cages, we first crossed drive males with non-drive females and then mixed them with non-drive females mated to non-drive males. The offspring (including males and females) from these crosses were recorded as the G0 generation, and their ratios were recorded as release frequency. We have specified the release ratio adult numbers in the following paragraph and supplementary file.

      Reviewer #2 (Significance (Required)):

      Overall the manuscript presents a valuable and timely resource for gene drive research, in particular for its systematic appraisal of potential target genes for population suppression drives and its rigorous assessment of the impact of maternal Cas9 deposition. The value in the generation and empirical testing of a novel multiplexed stl-targeting gene drive that led to population eradication in a cage trial should not be understated. While several key aspects of the discussion of the manuscript should be strengthened, the study presents a meaningful contribution to the field, extending previous work and and outlines important considerations for the design and implementation of effective gene drive systems.

      -We thank the reviewer for their encouraging and constructive comments. We are pleased that the systematic evaluation of target genes, the analysis of maternal Cas9 deposition, and the multiplexed stl-targeting drive were recognized as valuable contributions. We have strengthened the discussion as suggested, and we believe these revisions further enhance the manuscript as an aid for the design and implementation of future gene drive systems.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Xu and colleagues explored how CRISPR-based homing gene drives could be used to suppress insect populations by targeting female fertility genes in Drosophila melanogaster. They engineered split gene drives with multiplexed guide RNAs to target nine candidate genes, seeking to prevent functional resistance and achieve high drive conversion with minimal fitness costs.

      Here my comments about this work:

      Abstract: While the stated aim of the study on line 16 is to "maintain high drive conversion efficiency with low fitness costs in female drive carriers," the conclusion in lines 29-31 shifts focus toward the broader challenges and future optimization of gene drive systems. This conclusion does not clearly highlight the specific results of the study or how they relate directly to the original objective. It would be more effective to emphasize the actual findings, such as which target genes performed best and under what conditions, and how these findings support or contradict the stated goals. The study primarily aimed to assess the efficiency of specific female fertility genes and to evaluate strategies for minimizing the formation of functional resistance alleles, rather than proposing a protocol for optimization. Therefore, better alignment is needed between the study's aim, experimental design, and concluding statements. Clarifying this alignment would also help refine the paper's focus and more accurately communicate its contribution, including whether it is exploratory, comparative, or methodologically driven.

      -We have revised the abstract to clarify the alignment as suggested by the reviewer. We note that this discrepancy is due to the initial aim of our study being different than some of the important lessons learned along the way regarding fitness effects from Cas9 deposition in split drives. Still, we agree that it would be better to be more consistent in our wording and conclusions.

      Introduction: One of the key design elements in this study is the use of multiplexed gRNAs. It is reasonable to assume that this strategy may influence fitness costs, potentially in more than one way. Given that assessing fitness cost is a major focus of the study, it would be helpful to include a brief discussion of previous research examining how multiplexed gRNAs may impact fitness in gene drive systems. A short review of relevant studies, if available, would provide important context for interpreting the results and could help clarify whether any observed fitness costs might be attributed, at least in part, to the multiplexing strategy itself. This addition could be appropriately placed around line 102, where gRNA design is discussed.

      -We have added an explanation in the Discussion to mention this. However, it has not been conclusively shown that multiplexed gRNAs have any effect on fitness. Indeed, there have been some multiplexed constructs that seem to have no fitness effect, and some that have high fitness costs. This doesnโ€™t rule out the potential for multiplexed gRNAs to influence fitness itself, but it means that the mechanism may be complex. The new text reads:

      โ€œAnother potential though unconfirmed source of fitness cost arises from increased cleavage events associated with multiplexed gRNAs, where the greater number of gRNAs can enhance the overall cut rate compared to single-gRNA designs.โ€

      Line 42: Cas12a also showed efficacy using gene drives in yeast and Drosophila.

      -We now mention Cas12a at the beginning of the introduction.

      Line 133: The paragraph begins by stating that homologs of the target genes were identified and aligned. To improve clarity, especially for readers who are new to gene drive research, it would be helpful to begin the paragraph with a brief introductory sentence explaining the purpose of this step. For example, you could state the importance of identifying and aligning homologs to assess the conservation of target sites across species, which is critical for evaluating the broader applicability of gene drive strategies. This context would guide the reader and clarify the relevance of the analysis.

      -We have added the explanation as suggested.

      Lines 144-145: You mention that "the exception was tra, for which two constructs containing different gRNA sets were generated." For clarity, it would be helpful to provide a brief explanation of why two different gRNA sets were used for tra, and whether this differs from the approach taken with the other target genes. It's currently unclear whether all other genes were targeted using a single, standardized set of gRNAs, and this should be explicitly stated here for consistency, even though it is mentioned later in the plasmid construction section. Additionally, I suggest combining the sections on gRNA target design and plasmid construction. Since these components are closely related and sequential in the experimental workflow, presenting them together would improve the logical flow and help readers follow the methodology more smoothly.

      -We have combined both the gRNA target design and plasmid construction sections. We also discuss the two tra constructs early in the results section (see response to reviewer 2).

      Line 210: The analysis of the cage experiments was based on models from previous studies that used a simplified assumption of a single gRNA at the target site. While I understand this approach has precedent, it raises important questions about potential limitations. Specifically, could simplifying the analysis to one gRNA affect the conclusions of this study, given that the experimental design involves multiplexed gRNAs with four distinct target sites? The implications of using this simplified model should be clearly addressed, as the dynamics of drive efficiency, resistance formation, and fitness effects may differ when multiple gRNAs are employed. Additionally, while I am not a statistician, it is worth asking whether more sophisticated modeling approaches could be applied to account for all four gRNAs, rather than reducing the system to a single-gRNA framework. A discussion of the modeling choices and their potential consequences would strengthen the interpretation of the results.

      -We have clarified this. While we have modeled multiple gRNAs with high fidelity in SLiM, the maximum likelihood method is not very amenable to such treatment. It may cause our fitness estimate to be a small overestimate, but give the low fitness inferences, would certainly not have a large enough effect to fundamentally change any conclusion (and should be of a consistent level across all cages). We now discuss this in the methods section.

      Lines 297-300: Your results show that the expression of all target genes was higher in females, except for oct, which had higher expression in males. Additionally, oct expression decreased in adults. Given that oct is functionally important for ovulation and fertilization, processes that are primarily required in adult females, this pattern is somewhat unexpected. Could there be a possible explanation for the lower expression of oct, particularly in females and especially in adults, where its function would presumably be most critical? A brief discussion or hypothesis addressing this discrepancy would help clarify the biological relevance and interpretation of the expression data.

      -Based on transcriptome data from FlyBase, derived from Graveley et al. (2011), Oct is indeed expressed slightly higher in adult males than in adult females. This difference may be attributed to the fact that the female flies used in the study were virgins; Oct expression could be upregulated post-mating to mediate ovulation. Additionally, Oct is expressed not only in reproductive tissues but also in other organs such as the nervous system, where sex-specific differences in cell type composition or neural activity may contribute to the observed expression bias. However, high expression does not necessarily correlate with essential expression. Though Oct could have multiple functions, itโ€™s still possible that the only apparent phenotype upon knockout is female sterility. We have added the following text: โ€œThis male-biased expression may result from the use of virgin females in the dataset, as oct is likely upregulated after mating. Moreover, oct is also expressed in non-reproductive tissues such as the nervous system, which may contribute to sex-specific differences in expression38. While oct may have multiple functions, it is possible that it is only essential for female fertility.โ€

      Lines 346-347: What is the distance between the gRNA target sites within each gene? Are all of the gRNAs confirmed to be active? It would be valuable to include a table summarizing the distance between target sites for each gene, the activity levels of the individual gRNAs, and the corresponding homing rates. This would help determine whether there is a correlation between gRNA spacing and drive efficiency. For example, Lopez del Amo et al. (Nature Communications, 2020) demonstrated that even a 20-nucleotide mismatch at each homology arm can significantly reduce drive conversion. Including such a comparative analysis in your study could provide important insights into how gRNA arrangement influences overall drive performance and would be incredibly helpful for future multiplexing designs.

      -We have showed previously that close spacing of gRNAs should help maintain high drive conversion efficiency, and this is alluded to indirectly in the introduction (we now mention it more directly). In our study, gRNAs were positioned in close proximity without overlap, with the general distance between the outermost cut sites within each gene being We have added a summary table (Table S3) presenting the sequencing results, which also showed gRNA activity levels. Notably, most but not all gRNAs were active, at least for embryo resistance (low to moderate activity may still be present in the germline). Coupled with varying activity levels for those that were active, this likely contributed to reduced drive conversion due to mismatches at the homology arms. This observation supports the notion that drive performance could be optimized by selecting and arranging more active gRNAs. Consistent with this, our second construct targeting tra (tra-v2) exhibited a higher inheritance rate than the original construct, suggesting that gRNA arrangement and activity critically influence drive efficiency. Testing the activity of every single gRNA requires the construction of multiple gRNA lines, since in vitro or ex vivo tests will not be accurate as in vivo transformation test. However, in our study, as long as drive conversion rates were reasonably high, further optimization was not needed. Therefore, the multiplexing gRNA design can not only maximize drive conversion, but also reduce labor filtering an increased number of 1-gRNA designs with lower performance.

      Line 434: I was not able to find any sequencing data. This is important to evaluate gRNA activities and establish correlations with drive efficiency.

      -We have added a summary of the sequencing results in Table S3, though these are for embryo resistance alleles. Note that while high gRNA activity is correlated with high drive inheritance, these are not directly related. For suppression drives, germline resistance rates are usually of low importance compared to drive inheritance, so we did not assess these in detail (and pessimistically assumed complete germline resistance in our cage models).

      Line 482: Did the authors test Cas9-only individuals (without the drive) against a wild-type population? This would help determine whether Cas9 alone has any unintended fitness effects. Additionally, is Cas9 expression stable over time and across generations? It would be helpful to include any observations or thoughts on the long-term stability and potential fitness impact of Cas9 in the absence of the drive element.

      -We did not perform a direct comparison of Cas9-only individuals and wild-type flies in this study. However, previous studies (Champer et al., Nature Communications, 2020 - Langmuller et al., eLife 2022), which we now cite in the discussion, found no significant fitness difference between very similar Cas9-expressing lines and wild type in the absence of a drive element, indicating no significant fitness impact from Cas9 alone (though we cannot exclude a small effect, it certainly could not come close to explaining our results). In our experiments, Cas9 expression was generally stable across generations, as indicated by consistent drive inheritance and fertility test results obtained from independent batches. Separate from this study, we did observe rare instability in one nanos-Cas9 line, which had remained stable for over five years but recently became inactive (low population maintenance size may have caused stochastic removal of the functional allele). It is something to watch out for, but probably not on the timescale of a single study.

      Discussion: I would appreciate a more direct and clearly stated conclusion that summarizes the key findings of the study. While the discussion addresses the main outcomes in depth, presenting a concise concluding paragraph, either at the end of the discussion or as a standalone conclusion section, would provide a stronger and more definitive closing statement. This would help reinforce what the study ultimately achieved and ensure the main takeaways are clearly communicated to the reader.

      -We have revised and expanded the last paragraph of the discussion section to make our findings more direct and clear.

      Overall, I believe this is an important study that offers valuable insights for advancing the design of CRISPR-based gene drives. The findings contribute to the development of more efficient and practical gene drive prototypes, bringing the field closer to real-world applications.

      Reviewer #3 (Significance (Required)):

      In this study, Xu and colleagues explored how CRISPR-based homing gene drives could be used to suppress insect populations by targeting female fertility genes in Drosophila melanogaster. They engineered split gene drives with multiplexed guide RNAs to target nine candidate genes, seeking to prevent functional resistance and achieve high drive conversion with minimal fitness costs. Among the targets, the stall (stl) and octopamine ฮฒ2 receptor (oct) genes performed better, showing the highest inheritance rates in lab crosses. When tested in population cages, the stl drive was able to completely eliminate a fly population, but only when released at a high enough frequency, while other cages failed. These failures were traced and explained by fitness cost in drive-carrying females, caused largely by maternally deposited Cas9, which led to embryo resistance and reduced fertility. Through additional fertility assays and modeling, the team confirmed that the origin and timing of Cas9 expression, particularly from mothers, significantly impacted drive success. Surprisingly, even when Cas9 was driven by promoters with supposedly low somatic activity, such as nanos, fitness still persisted. The study revealed that while gene drives can be powerful, their effectiveness relies on finely balanced factors like promoter choice, drive architecture, and gene function. Overall, the research offers valuable lessons for designing robust, next-generation gene drives aimed at ecological pest control.

      -We sincerely appreciate the reviewerโ€™s positive and thoughtful comments. We agree that the points raised highlight the importance of our findings and hope that our revisions have further improved both the clarity and overall content of the manuscript.

    1. This manuscript examines preprint review services and their role in the scholarly communications ecosystem. ย It seems quite thorough to me. In Table 1 they list many peer-review services that I was unaware of e.g. SciRate and Sinai Immunology Review Project.

      To help elicit critical & confirmatory responses for this peer review report I am trialling Elsevierโ€™s suggested โ€œstructured peer reviewโ€ core questions, and treating this manuscript as a research article.

      Introduction

      1. Is the background and literature section up to date and appropriate for the topic?

        Yes.

      2. Are the primary (and secondary) objectives clearly stated at the end of the introduction?

        No. Instead the authors have chosen to put the two research questions on page 6 in the methods section. I wonder if they ought to be moved into the introduction โ€“ the research questions are not methods in themselves. Might it be better to state the research questions first and then detail the methods one uses to address those questions afterwards? [as Elsevierโ€™s structured template seems implicitly to prefer.

      Methods

      1. Are the study methods (including theory/applicability/modelling) reported in sufficient detail to allow for their replicability or reproducibility?

        I note with approval that the version number of the software they used (ATLAS.ti) was given.

        I note with approval that the underlying data is publicly archived under CC BY at figshare.

        The Atlas.ti report data spreadsheet could do with some small improvement โ€“ the column headers are little cryptic e.g. โ€œNยบย  ST โ€œ and โ€œSTโ€ which I eventually deduced was Number of Schools of Thought and Schools of Thought (?) ย ย 

        Is there a rawer form of the data that could be deposited with which to evidence the work done? The Atlas.ti report spreadsheet seemed like it was downstream output data from Atlas.ti. What was the rawer input data entered into Atlas.ti? Can this be archived somewhere in case researchers want to reanalyse it using other tools and methods.

        I note with disapproval that Atlas.ti is proprietary software which may hinder the reproducibility of this work. Nonetheless I acknowledge that Atlas.ti usage is somewhat โ€˜acceptedโ€™ in social sciences despite this issue.

        I think the qualitative text analysis is a little vague and/or under-described: โ€œUsing ATLAS.ti Windows (version 23.0.8.0), we carried out a qualitative analysis of text from the relevant sites, assigning codes covering what they do and why they have chosen to do it that way.โ€ Thatโ€™s not enough detail. Perhaps an example or two could be given? Was inter-rater reliability performed when โ€˜assigning codesโ€™ ? How do we know the โ€˜codesโ€™ were assigned accurately?

      2. Are statistical analyses, controls, sampling mechanism, and statistical reporting (e.g.,โ€ฏP-values, CIs, effect sizes) appropriate and well described?

        This is a descriptive study (and thatโ€™s fine) so there arenโ€™t really any statistics on show here other than simple โ€˜countsโ€™ (of Schools of Thought) in this manuscript. There are probably some statistical processes going on within the proprietary qualitative analysis of text done in ATLAS.ti but it is under described and so hard for me to evaluate.ย 

      Results

      1. Is the results presentation, including the number of tables and figures, appropriate to best present the study findings?

        Yes. However, I think a canonical URL to each service should be given.ย  A URL is very useful for disambiguation, to confirm e.g. that the authors mean this Hypothesis (www.hypothes.is) and NOT this Hypothesis (www.hyp.io). I know exactly which Hypothesis is the one the authors are referring to but we cannot assume all readers are experts ๐Ÿ˜Š

        Optional suggestion: I wonder if the authors couldnโ€™t present the table data in a slightly more visual and/or compact way? Itโ€™s not very visually appealing in its current state. Purely as an optional suggestion, to make the table more compact one could recode the answers given in one or more of the columns 2, 3 and 4 in the table e.g. "all disciplines =ย โฌค , biomedical and life sciences =ย โ–ฒ, social sciences =ย ย โ€กย  , engineering and technologyย =ย โ€  ". I note this would give more space in the table to print the URLs for each service that both reviewers have requested.

        โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

        | Service name | Developed by | Scientific disciplines | Types of outputs |

        | Episciences | Other | โฌค | blah blah blah. |

        | Faculty Opinions | Individual researcher | โ–ฒ | blah blah blah. |

        | Red Team Market | Individual researcher | โ€ก | blah blah blah. |

        โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

        The "Types of outputs" column might even lend themselves to mini-colour-pictograms (?) which could be more conciseย andย more visually appealing? A table just of text, might be scientifically 'correct' but it is incredibly dull for readers, in my opinion.

      2. Are additional sub-analyses or statistical measures needed (e.g., reporting of CIs, effect sizes, sensitivity analyses)?

        No / Not applicable.ย 

      Discussion

      1. Is the interpretation of results and study conclusions supported by the data and the study design?

        Yes.

      2. Have the authors clearly emphasized the limitations of their study/theory/methods/argument?

        No. Perhaps a discussion of the linguistic/comprehension bias of the authors might be appropriate for this manuscript. What if there are โ€˜localโ€™ or regional Chinese, Japanese, Indonesian or Arabic language preprint review services out there? Would this authorship team really be able to find them?

      Additional points:

      • Perhaps the points made in this manuscript about financial sustainability (p24) are a little too pessimistic. I get it, there is merit to this argument, but there is also some significant investment going on there if you know where to look. Perhaps it might be worth citing some recent investments e.g. Gates -> PREreview (2024) https://content.prereview.org/prereview-welcomes-funding/ย  and Arcadiaโ€™s $4 million USD to COAR for the Notify Project which supports a range of preprint review communities including Peer Community In, Episciences, PREreview and Harvard Library.ย  (source: https://coar-repositories.org/news-updates/coar-welcomes-significant-funding-for-the-notify-project/ )ย 

      • Although I note they are mentioned, I think more needs to be written about the similarity and overlap between โ€˜overlay journalsโ€™ and preprint review services. Are these arguably not just two different terms for kinda the same thing? If you have Peer Community In which has itโ€™s overlay component in the form of the Peer Community Journal, why not mention other overlay journals like Discrete Analysis and The Open Journal of Astrophysics.ย ย  I think Peer Community In (& itโ€™s PCJ) is the go-to example of the thin-ness of the line the separates (or doesnโ€™t!) overlay journals and preprint review services. Some more exposition on this would be useful.

    1. Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the between-fly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

    2. Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

    3. Author response:

      The following is the authorsโ€™ response to the previous reviews.

      Reviewer #1 (Public review): ย 

      Summary: ย 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: ย 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts. ย 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses: ย 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? ย 

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". ย 

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about interindividual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of interindividual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. ย 

      Comments on revisions: ย 

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists. ย 

      (1) GLM Analysis Explanation (Figure 9) ย 

      While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other nonstatistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.

      The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.

      The relationship between the GLM findings and their original correlationbased conclusions needs better integration and connection, leading the reader through your reasoning.

      We thank the reviewer for highlighting this important point. We have revised the Results section in the reviseed manuscript to include a more detailed explanation of the GLM analysis. Specifically, we now clarify the interpretation of the model coefficients, including the direction and statistical significance, in relation to the hypothesized effects. We also outline the criteria we used to assess how well the GLM supports our original correlation-based conclusionsโ€”namely, whether the sign and significance of the coefficients align with the expected relationships derived from our prior analysis. Finally, we explicitly describe how the GLM results confirm or extend the patterns observed in the correlation-based analysis, to guide readers through our reasoning and the integration of both approaches.

      (2) Documentation of Changes ย 

      One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      We thank the reviewer for bringing this to our attention. We were equally confused to learn that the tracked-changes version was not visible, despite having submitted one to eLife as part of our revision.ย 

      Upon contacting the editorial office, they confirmed that we did submit a trackedchanges version, but clarified that it did not contain embedded figures (as they were added manually to the clean version).ย  The editorial response said in detail: โ€œRegarding the tracked-changes file: it appears the version with markup lacked figures, while the figure-complete PDF had markup removed, which likely caused the confusion mentioned by the reviewers.โ€ We hope this answer from eLife clarifies the reviewersโ€™ concern.

      (2)ย  Statistical Method Selection ย 

      The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      Why ridge regression was selected as the optimal method ย 

      How the regularization parameter (ฮป) was determined ย 

      How this choice affects the interpretation of environmental parameters' influence on individuality

      We appreciate the reviewerโ€™s thoughtful question regarding our choice of statistical method. In response, we have expanded the Methods section in the revised manuscript to provide a more detailed justification for the use of a GLM, including ridge regression. Specifically, we explain that ridge regression was selected to address collinearity and to control for overfitting.

      We now also describe how the regularization parameter (ฮป) was selected: we used 5-fold cross-validation over a log-spaced grid (10<sup>โปโถ</sup> - 10<sup>โถ</sup) to identify the optimal value that minimized the mean squared error (MSE).

      Finally, we clarify in both the Methods and Results sections how this modeling choice affects the interpretation of our findings.ย 

      Reviewer #2 (Public review): ย 

      Summary: ย 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: ย 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: ย 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. ย 

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Reviewer #3 (Public review): ย 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. ย 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).

      (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others. ย 

      Comments on revisions:' ย 

      The authors have addressed my previous concerns. ย 

      We thank the reviewer for the positive feedback and are glad our revisions have satisfactorily addressed the previous concerns. We appreciate the thoughtful input that helped us improve the clarity and rigor of the manuscript.

      Reviewer #1 (Recommendations for the authors): ย 

      Comment on Revised Manuscript ย 

      Recommendations for Improvement ย 

      (1) Expand the Results section for Figure 9 with a more detailed interpretation of the GLM coefficients and their biological significance

      (2) Provide explicit criteria (or at least explain in detail) for how the GLM results confirm or undermine their original hypothesis about environmental context hierarchy

      While the claims are interesting, the additional statistical analysis appears promising. However, clearer explanation of these new results would strengthen the paper and ensure that readers from diverse backgrounds can fully understand how the evidence supports the authors' conclusions about individuality across environmental contexts.ย 

      We thank the reviewer for these constructive suggestions. In response to these suggestions, we have expanded both the Methods and Results sections to provide a more detailed explanation of the GLM coefficients, including their interpretation and how they relate to our original correlation-based findings.

      We now clarify how the direction, magnitude, and statistical significance of specific coefficients reflect the influence of different environmental factors on the persistence of individual behavioral traits. To make this accessible to readers from diverse backgrounds, we explicitly outline the criteria we used to evaluate whether the GLM results support our hypothesis about the hierarchical influence of environmental context, namely, whether the structure and strength of effects align with the patterns predicted from our prior correlation analysis.

      These additions improve clarity and help readers understand how the new statistical results reinforce our conclusions about the context-dependence of behavioral individuality.

      Reviewer #2 (Recommendations for the authors): ย 

      Thanks for the revision of the paper! I updated my review to try and provide a little more guidance by what I mean about updating your analyses. I really think this is a super cool data set and I genuinely wish this were MY dataset so that way I could really dig into it to partition the variance. These variance partitioning methods are standard in my particular subfield (study of individual behavioral variation in ecology and evolution) and so I think employing them is 1) going to offer a MUCH more elegant and holistic view of the behavioral variation (e.g. you can report a single repeatability estimate for each behavior rather than 3 different correlations) and 2) improve the impact and readership for your paper as now you'll be using methods that a whole community of researchers are very familiar with. It's just a suggestion, but I hope you consider it!

      We sincerely thank the reviewer for the insightful and encouraging feedback and for introducing us to this modeling approach. In response to this suggestion, we have incorporated a hierarchical linear mixed-effects model into our analysis (now presented in Figure 10), accompanied by a new supplementary table (Table T3). We also updated the Methods, Results, and Discussion sections to describe the rationale, implementation, and implications of the mixed-model analysis.

      We agree with the reviewer that this approach provides a more elegant way to quantify behavioral variation and individual consistency across contexts. In particular, the ability to estimate repeatability directly aligns well with the core questions of our study. It facilitates improved communication of our findings to ecology, evolution, and behavior researchers. We greatly appreciate the suggestion; it has significantly strengthened both the analytical framework and the interpretability of the manuscript.

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understanding the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into a solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-ฯ€ interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      We are grateful for the refereeโ€™s assessment of our work and insightful suggestions, which we address point by point below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained models, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessรฉn et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Figure 11a of the above-cited Wessรฉn et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      We agree with the referee that the wording used in the original version was inaccurate. We did not want to expand too much on the previous results on Lys/Arg, to avoid overwhelming our readers with background information that was not directly relevant to the aromatic residues Phe and Tyr. We have now introduced some of the missing details in the hope that this will provide a more accurate account of what has been achieved with different versions of coarse-grained models. In the revised version, we say the following:

      Das and co-workers attempted to explain arginineโ€™s greater propensity to phase separate in Ddx4 variants using coarse-grained simulations with two different energy functions (Das et al., 2020). The model was first parametrized using a hydrophobicity scale, aimed to capture the โ€œstickinessโ€ of different amino acids (Dignon et al., 2018), but this did not recapitulate the correct rank order in the stability of the simulated condensates (Das et al., 2020). By replacing the hydrophobicity scale with interaction energies from amino acid contact matrices โ€”derived from a statistical analysis of the PDB (Dignon et al., 2018; Miyazawa and Jernigan, 1996; Kim and Hummer, 2008)โ€” they recovered the correct trends (Das et al., 2020). A key to the greater propensity for LLPS in the case of Arg may derive from the pseudo-aromaticity of this residue, which results in a greater stabilization relative to the more purely cationic character of Lys (Gobbi and Frenking, 1993; Wang et al., 2018; Hong et al., 2022).

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Figure 4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appear to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or the critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      We are grateful for this very insightful comment by the referee. We have followed this suggestion to address whether, despite similar interaction patterns in GSY and GSF condensates, their stabilities are different. As in our previous work (De Sancho, 2022), we have run replica exchange MD simulations for both condensates and derived their phase diagrams. Our results, shown in the new Figure 5 and supplementary Figs. S6-S7, clearly indicate that the GSY condensate has a lower saturation density than the GSF condensate. This result is consistent with the trends observed in experiments on mutants of the low-complexity domain of hnRNPA1, where the relative amounts of F and Y determine the saturation concentration (Bremer et al., 2022).

      (3) Page 9, lines 315-316: "...Our ฮต [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)". ย For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for a polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that the relative permittivity of Ddx4 condensate with protein volume fraction โ‰ˆ 0.4 can have a relative permittivity โ‰ˆ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ฮต = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      We thank the referee for this useful comment. We are aware that the estimate we mentioned is not direct. We have now clarified this point and added the additional estimate from Das et al. In the new version of the manuscript, we say:

      Our ๐œ€ values for the condensates (39 ยฑ 5 for GSY and 47 ยฑ 3 for GSF) are surprisingly close to that derived from experiments on Ddx condensates using Flory-Huggins theory (45ยฑ13) (Nott et al., 2015) and from atomistic simulations of Ddx4 (โˆผ35โˆ’50 at a volume fraction of ๐œ™ = 0.4) (Das et al., 2020).

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ฮต=2-4, or at most 15), condensates formed by highly charged polymers can have relative permittivity higher than that of bulk water [Wessรฉn et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      We thank the referee for their comment regarding highly charged polymers. However, we have chosen not to address these systems in our manuscript, as they are significantly different from the GSY/GSF peptide condensates under investigation. In polyelectrolyte systems, condensate formation is primarily driven by electrostatic interactions and counterion release, while we highlight the role of transfer free energies. At high dielectric constants (and dielectrics even higher than that of water), the strength of electrostatic interactions will be greatly reduced. In our approach to estimate differences between Y and F, the transfer free energy should plateau at a value of ฮ”ฮ”G=0 in water. At greater values of ฮต>80, it becomes difficult to predict whether additional effects might become relevant. As this lies beyond the scope of our current study, we prefer not to speculate further.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (page 4 and thus the simulation box has a rectangular geometry. Did the authors apply Equation 2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Figure 3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Equation 16 in the above-cited Wessรฉn et al. 2021 paper). A brief added comment will be useful.

      We have calculated the relative permittivity from dense phases only. These dense phases were sliced from the slab geometry and then re-equilibrated. Long simulations were then run to converge the calculation of the dielectric constant. We have clarified this in the Methods section of the paper. We say:

      For the calculation of the dielectric constant of condensates, we used the simulations of isolated dense phases mentioned above.

      (6) Concerning the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-ฯ€ interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Figure 3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      We thank our referee for this insightful comment. While we restrict our analysis to aromatic pairs in this work, the observed crossover will certainly affect other pairs where tyrosine or phenylalanine are involved. We now comment on this point in the discussions section of the revised manuscript. This topic will be explored in detail in a follow-up manuscript we are currently completing. We say:

      We note that, although we have not included in our analysis positively charged residues that form cation-ฯ€ interactions with aromatics, the observed crossover will also be relevant to Arg/Lys contacts with Phe and Tyr. Following the rationale of our findings, within condensates, cation-Tyr interactions are expected to promote phase separation more strongly than cation-Phe pairs.

      (7) Page 9: The observation of weaker effective F-F (and a few other nonpolar-nonpolar) interactions in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resources in protein science.

      We thank the referee for bringing this body of work to our attention. In the revised version of our work, we briefly mention how it relates to our results. We also note that the suggested references have pointed to another of the limitations of our study, that of chain connectivity, addressed in the work by Shimizu and Chan. While we were well aware of these limitations, we had not mentioned them in our manuscript. Concerning the distinction between pair and bulk hydrophobicities, we include the following in the concluding lines of our work:

      The observed context dependence has deep roots in the concepts of โ€œpairโ€ and โ€œbulkโ€ hydrophobicity (Wood and Thompson, 1990; Shimizu and Chan, 2002). While pair hydrophobicity is connected to dimerisation equilibria (i.e. the second step in Figure 2B), bulk hydrophobicity is related to transfer processes (the first step). Our work stresses the importance of considering both the pair contribution that dominates at high solvation, and the transfer free energy contribution, which overwhelms the interaction strength at low dielectrics.

      Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and Lรณpez use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      We thank the referee for his/her positive assessment of our work. Below, we address all the questions raised one by one.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

      Weaknesses:

      While the study is quite comprehensive and the paper well written, there are a few instances that would benefit from additional details. In the methods section, it is unclear as to whether the GGXGG peptides upon which the alchemical transforms are conducted are positioned restrained within the condensed/dilute phase or not. If they are not, how would the position of the peptides within the condensate alter the calculated free energies reported?ย 

      The peptides are not restrained in our simulations and can therefore diffuse out of the condensate given sufficient time. Although the GGXGG peptide can, given sufficient time, leave the peptide condensate, we did not observe any escape event in the trajectories we used to generate starting points for switching. Hence, the peptide environment captured in our calculations reflects, on average, the protein-protein and protein-solvent interactions inside the model condensate. We believe this is the right way of performing the calculation of transfer free energy differences into the condensate. We have clarified this point when we describe the equilibrium simulation results in the revised manuscript. We say:

      Also, the peptide that experiences the transformation, which is not restrained, must remain buried within the condensate for all the snapshots that we use as initial frames, to avoid averaging the work in the dilute and dense phases.

      On the refereeโ€™s second point of whether there would be differences if the peptide visited the dilute phase, the answer is that, indeed, we would. We expect that the behaviour of the peptide would approach ฮ”ฮ”G=0, considering the low protein concentration in the dilute phase. For mixed trajectories with sampling in both dilute and dense phases, our expectation would be a bimodal distribution in the free energy estimates from switching (see e.g. Fig. 8 in DOI:10.1021/acs.jpcb.0c10263). Because we are exclusively interested in the transfer free energies into the condensate, we do not pursue such calculations in this work.

      It would also be interesting to see what the variation in the transfer of free energy is across multiple independent replicates of the transform to assess the convergence of the simulations.ย 

      Upon submission of our manuscript, we were confident that the results we had obtained would pass the test of statistical significance. We had, after all, done many more simulations than those reported, plus the comparable values of ฮ”ฮ”G<sub>Transfer</sub> for both GSY and GSF pointed in the right direction. However, we acknowledge that the more thorough test of running replicates recommended by the referee is important, considering the slow diffusion within the Tyr peptide condensates due to its stickiness. Also, the non-equilibrium switching method had not been tested before for dense phases like the ones considered here.

      We have hence followed our referee's suggestion and done three different replicates, 1 ฮผs each, of the equilibrium runs starting from independent slab configurations, for both the GSY and GSF condensates (see the new supporting figures Fig. S1, S2 and S5). We now report the errors from the three replicates as the standard error of the mean (bootstrapping errors remain for the rest of the solvents). Our results are entirely consistent with the values reported originally, confirming the validity of our estimates.

      Additionally, since the authors use a slab for the calculation of these free energies, are the transfer free energies from the dilute phase to the interface significantly different from those calculated from the dilute phase to the interior of the condensate?ย 

      We thank the referee for this valuable comment, as it has pointed us in the direction of a rapidly increasing body of work on condensate interfaces, for example, as mediators of aggregation, that we may consider for future study with the same methodology. However, as discussed above, we have not considered this possibility in our work, as we decided to focus on the condensate environment, rather than its interface.

      The authors mention that the contact statistics of Phe and Tyr do not show significant difference and thereby conclude that the more favorable transfer of Tyr primarily originates from the dielectric of the condensate. However, the calculation of contacts neglects the differences in the strength of interactions involving Phe vs. Tyr. Though the authors consider the calculation of energy contact formation later in the manuscript, the scope of these interactions are quite limited (Phe-Phe, Tyr-Tyr, Tyr-Amide, Phe-Amide) which is not sufficient to make a universal conclusion regarding the underlying driving forces. A more appropriate statement would be that in the context of the minimal peptide investigated the driving force seems to be the difference in dielectric. However, it is worth mentioning that the authors do a good job of mentioning some of these caveats in the discussion section.

      We thank the referee for this important comment. Indeed, the similar contact statistics and interaction patterns that we reported originally do not necessarily imply identical interaction energies. In other words, similar statistics and patterns can still result in different stabilities for the Phe and Tyr condensates if the energetics are different. Hence, we cannot conclude that the GSF and GSY condensate environments are equivalent.

      To address this point, we have run new simulations for the revised version of our paper, using the temperature-replica exchange method, as before. From the new datasets, we derive the phase diagrams for both the GSF and GSY condensates (see the new Fig. 5). We find that the tyrosine-containing condensate is more stable than that of phenylalanine, as can be inferred from the lower saturation density in the low-density branch of the phase diagram. In consequence, despite the similar contact statistics, the energetics differ, making the saturation density of the GSY slightly lower than that of GSF. This result is consistent with experimental data by Bremer et al (Nat. Chem. 2022).ย 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors address the paradox of how tyrosine can act as a stronger sticker for phase separation than phenylalanine, despite phenylalanine being higher on the hydrophobicity scale and exhibiting more prominent pairwise contact statistics in folded protein structures compared to tyrosine.

      We are grateful for the refereeโ€™s favourable opinion on the paper. Below, we address all of the issues raised.

      Strengths:

      This is a fascinating problem for the protein science community with special relevance for the biophysical condensate community. Using atomistic simulations of simple model peptides and condensates as well as quantum calculations, the authors provide an explanation that relies on the dielectric constant of the medium and the hydration level that either tyrosine or phenylalanine can achieve in highly hydrophobic vs. hydrophilic media. The authors find that as the dielectric constant decreases, phenylalanine becomes a stronger sticker than tyrosine. The conclusions of the paper seem to be solid, it is well-written and it also recognises the limitations of the study. Overall, the paper represents an important contribution to the field.

      Weaknesses:

      How can the authors ensure that a condensate of GSY or GSF peptides is a representative environment of a protein condensate? First, the composition in terms of amino acids is highly limited, second the effect of peptide/protein length compared to real protein sequences is also an issue, and third, the water concentration within these condensates is really low as compared to real experimental condensates. Hence, how can we rely on the extracted conclusions from these condensates to be representative for real protein sequences with a much more complex composition and structural behaviour?

      We agree with the main weakness identified by the referee. In fact, all these limitations had already been stated in our original submission. Our ternary peptide condensates are just a minimal model system that bears reasonable analogies with condensates, but definitely is not identical to true LCR condensates. The analogies between peptide and protein condensates are, however, worth restating:ย 

      (1) The limited composition of the peptide condensates is inspired by LCR sequences (see Fig. 4 in Martin & Mittag, 2018).

      (2) The equilibrium phase diagram, showing a UCST, is consistent with that of LCRs from Ddx4 or hnRNPA1.

      (3) The dynamical behaviour is intermediate between liquid and solid (De Sancho, 2022).ย 

      (4) The contact patterns are comparable to those observed for FUS and LAF1 (Zheng et al, 2020).

      The third issue pointed out by the referee requires particular attention. Indeed, the water content in the model condensates is low (~200 mg/mL for GSY) relative to the experiment (e.g. ~600 mg/mL for FUS and LAF-1 from simulations). Considering that both interaction patterns and solvation contribute to the favorability of Tyr relative to Phe, we speculate that a greater degree of solvation in the true protein condensates will further reinforce the trends we observe.

      In any case, in the revised version of the manuscript, we have made an effort to insist on the limitations of our results, some of which we plan to address in future work.

      Reviewer #3 (Recommendations for the authors):

      (1) The fact that protein density is so high within GSY or GSF peptide condensates may significantly alter the conclusions of the paper. Can the authors show that for condensates in which the protein density is ~0.2-0.3 g/cm3, the same conclusions hold? Could the authors use a different peptide sequence that establishes a more realistic protein concentration/density inside the condensate?

      Unfortunately, recent work with a variety of peptide sequences suggests that finding peptides in the density range proposed by the referee may be very challenging. For example, Pettit and his co-workers have extensively studied the behaviour of GGXGG peptides. In a recent work, using the CHARMM36m force field and TIP3P water, they report densities of ~1.2-1.3 g/mL for capped pentapeptide condensates (Workman et al, Biophys. J. 2024; DOI: 10.1016/j.bpj.2024.05.009). Brown and Potoyan have recently run simulations of zwitterionic GXG tripeptides with the Amber99sb-ILDNQ force field and TIP3P water, starting with a homogenous distribution in cubic simulation boxes (Biophys. J. 2024, DOI: 10.1016/j.bpj.2023.12.027). In a box with an initial concentration of 0.25 g/mL, upon phase separation, the peptide ends up occupying what would seem to be ~1/3 of the box, although we could not find exact numbers. This would imply densities of ~0.75 g/mL in the dense phase, with the additional problem of many charges. Finally, Joseph and her co-workers have recently simulated a set of hexapeptide condensates with varied compositions using a combination of atomistic and coarse-grained simulations. For the atomistic simulations, the Amber03ws force field and TIP4P water were used (see BioRxiv reference 10.1101/2025.03.04.641530). They have found values of the protein density in the dense phase ranging between 0.8 and 1.2 g/mL.ย  The consistency in the range of densities reported in these studies suggests that short peptides, at least up to 7-residues long, tend to form quite dense condensates, akin to those investigated in our work. While the examples mentioned do not comprehensively span the full range of peptide lengths, sequences, and force fields, they nonetheless support the general behaviour we observe. A systematic exploration of all these variables would require an extensive search in parameter space, which we believe falls outside the scope of the present study.

      (2) Do the conclusions hold for phase-separating systems that mostly rely on electrostatic interactions to undergo LLPS, like protein-RNA complex coacervates? In other words, could the authors try the same calculations for a binary mixture composed of polyR-polyE, or polyK-polyE?

      This is an excellent idea that we may attempt in future work, but the remit of the current work is aromatic amino acids Phe and Tyr only. Hence, we do not include calculations or discussion on polyR-polyE systems in our revised manuscript.

      (3) One of the major approximations made by the authors is the length of the peptides within the condensates, which is not realistic, or their density. Specifically, could they double or triple the length of these peptides while maintaining their composition so it can be quantified the impact of sequence length in the transfer of free energies?

      We thank the referee for this comment and agree with the main point, which was stated as a limitation in our original submission. The suggested calculations anticipate research that we are planning but will not include in the current work. One of the advantages of our model systems is that the small size of the peptides allows for small simulation boxes and relatively rapid sampling. Longer peptide sequences would require conformational sampling beyond our current capabilities, if done systematically. An example of these limitations is the amount of data that we had to discard from the new simulations we report, which amounts to up to 200 ns of our replica exchange runs in smaller simulation boxes (i.e. >19 ฮผs in total for the 48 replicas of the two condensates!). As stated in the answer to point 1, we have found in the literature work on peptides in the range of 1-7 residues with consistent densities. Additionally, a recent report using alchemical transformations using equilibrium techniques with tetrapeptide condensates, pointing to the role of transfer free energy as driving force for condensate formation, further supports the observations from our work.

      Minor issues:

      (1) The caption of Figure 3B is not clear. It can only be understood what is depicted there once you read the main text a couple of times. I encourage the authors to clarify the caption.

      We have rewritten the caption for greater clarity. Now it reads as follows:

      Time evolution of the density profiles calculated across the longest dimension of the simulation box (L) in the coexistence simulations. In blue we show the density of all the peptides, and in dark red that of the F/Y residue in the GGXGG peptide.

      (2) Why was the RDF from Figure 5A cut at such a short distance? Can the authors expand the figure to clearly show that it has converged?

      In the updated Figure 5 (now Fig. 6), we have extended the g(r) up to r=1.75 nm so that it clearly plateaus at a value of 1.

    1. If youโ€™ve ever annotated a paperback book, youโ€™ve probably found yourselfshort on space to write note

      I completely agree with this statement. Sometimes, when annotating a paperback, space feels so limited, and I find myself writing smaller and smaller notes just to fit everything in. It's frustrating because sometimes I have so many thoughts on a passage that a few words arenโ€™t enough to express them. But other times, a single word or idea is enough to capture the essence of what Iโ€™m thinking. It really depends on the depth of the text. Online annotations, however, offer much more freedom. Without the constraints of space, I can fully explore my thoughts and ideas without worrying about running out of room.

    1. can I say something or should I just go monstrousmarvellousKREON:you're treading a very thin line dreadfulGUARD:would you say it's your ears or your mind that finds me annoying

      Q: this conversation I find confusing yet feel like its ment to be comical

    1. When the episode aired, TikTok, X and Reddit were ablaze with arguments over the use of โ€œIโ€™m Kissing Youโ€ instead of a more modern option. โ€œPeople were like, โ€˜What is this?โ€™ and, I think, flummoxed by the choice โ€” to me thereโ€™s no other choice,โ€ Han said. โ€œIt may be one of those times where the audience was like, โ€˜Oh, we wanted something new,โ€™ but this is what I love.โ€

      I think that this tells a lot about our generation and the same situation is true for so many other people or platforms. Whether the publicity someone gets is for a positive or negative reason, it's still publicity and still racking up views and money. I really just enjoyed this quote because Han was receiving hate because of choices she made yet she decided to ignore it and not change anything which ended up working in her favor. I think it's important to remember not to just follow what everyone else says or does.

    1. Basic Orienting Facts-Lets the reader know who, when, where, and what is happening. Organization-The reason you order your content the way you do. Structure-The order in which you choose to present your events to your reader Scene-Vivid descriptions of the setting and what you said in order to feel immersed in a story. Scene is the opposite of summary. Use scene sparingly when you want to slow down and focus on an important part of the story. Summaryโ€“ A way to manage time. When you tell the reader what used to happen in your family, for example, you could explain, โ€œMy mother used to cook Sunday dinner for the family. She often made a roast.โ€ You are summarizing what used to happen in the past. If you were to write about a specific Sunday, and you fleshed out what happened in scene with dialogue, included details about the sound of vegetables being chopped, described the smells in the kitchen, and told the reader what your mother was wearing, and reflected on the conversation you had, that would be a scene. Summary condenses information in both academic and creative writing, but in creative writing, summary is linked to time management. Personaโ€“ The character of you that you construct. Itโ€™s not literally you, because you are not words on the page, right? You are flesh and bone and you have a rich inner life. Use that rich inner life to develop your persona. Persona comes from the Latin word for mask. Itโ€™s the version of you that you would like to illustrate for the reader in your memoir. This is a complicated concept. One way to think of your persona is you in relationship to the situation or people in the story. The persona can also be shaped by time: who and what you were like when you were twelve, for example. It can be shaped by relationship to your topic: who and what you are like in relationship to your mother or third grade teacher or your sergeant in boot camp. Readers Trust in You-Readers wonโ€™t automatically question your credibility as a narrator on the page, but if you seem very infallible or somehow superhuman while everyone else in the story is tragically flawed, then the reader will wonder about the truthfulness of your own self-depiction. You are accountable to telling the story to your reader as truthfully as you can, while using craft elements to engage the reader. Itโ€™s a daunting task. Also, readers like protagonists who are flawed, so be truthful about your mistakes. Setting-Where and when the story takes place. Mood-The emotional weight or atmosphere of a story, created through details, description, and other craft features, for example, sometimes setting can help create a mood. Imagery-An image in a story, or in a poem, is a description that appeals to one of the five senses. An image should also convey additional meaning, either emotional and/or intellectual. Itโ€™s not an image to say green gelatin. Green gelatin is meaningless until the reader injects the gelatin with meaning. You can, however, create an image if you were to write, โ€œThe Frog Eye Salad recipe that my beloved grandmother used to make for Sunday picnics.โ€ The latter description is specific and contains emotional content. Reflection-The sense and interpretation that you make of the events that transpired in your memoir and how you feel and/or think about them. You can also reflect on the story and relate the events to the universal meaning or theme you would like to include in the story.

      By understanding and using these craft features like scene, summary, persona, setting, mood, imagery, and reflection you can write stories that are clear, engaging, and meaningful not just to yourself but the reader as well. These tools help you organize your story, bring it to life, and connect with your readers on a deeper level

    1. Itโ€™s not always necessary that the data be made absolutelyunavailable; sometimes data can just be decontextualized enough tobecome less valuable. Facebook provides a fine example. If a greatdeal of personal creativity and life experience has been added to thesite, itโ€™s hard to give all that up. Even if you capture every little thingyou had uploaded, you canโ€™t save it in the context of interactions withother people. You have to lose a part of yourself to leave Facebookonce you become an avid user. If you leave, it will become difficult forsome people to contact you at all.

    Tags

    Annotators

    1. Testimonials I went from a terrified, overweight, 40-year-old amputee to a much smaller, powerlifting person full of confidence. If you ask my trainer he simply says "she did the work, I just pointed at heavy things for her to pick up", but the reality is, his guidance and unshakeable faith in me has changed my life. Jeannie S.ย  Calgary, Albertaย  One day I saw some photos of myself, and I thought โ€œthis is not who I amโ€, but didnโ€™t know where or how to start. I got a personal trainer and have been able to lose 86 LBS. I feel younger and more energetic, I no longer snore, my knees and back do not ache at night, and I can keep up physically when playing sports with my kids.ย  Robert S.ย  Burlington, Ontarioย  I finally feel what itโ€™s like to be empowered by your body and be passionate about fitness. Iโ€™ve lost over 60 lbs, altered my body fat percentage from 43% to 22% and have become decorated as an Eastern Canadian Champion, National silver medalist, and Commonwealth Champion in the sport of powerlifting.ย  Krystyna U.ย  Halifax, Nova Scotiaย  I knew I had to change to be the kind of father I wanted to be. I can't overstate how certain I felt that this journey was impossible, but now Iโ€™m around 205lb and 26% body fat. I wake up in the morning feeling healthy and energetic. Achieving what seemed impossible makes all of life's challenges feel conquerable. Stephen H.ย  Toronto, Ontarioย  ย  In 2019 I hit my breaking point, I weighed 295lbs, and was plagued with health issues. My father had heart surgery to replace a valve, and I was told that I would need the same. Since then, my trainer has become one of my best friends and together weโ€™ve achieved over a 100+ pound weight loss. Heโ€™s given me back my life, and thatโ€™s something that means the world to me. Troy Z.ย  Brampton, Ontarioย  Previous Next

      In terms of accessibility to see the additional testimonials, the website could include arrows in addition to clicking the small circles.

    1. (observation; serving as a one-on-one assistant; discussion and coaching; reflection; and planning) are crucial for successful practicum experiences in which preservice and in-service music teachers gain as much as possible through observation and participation.

      Important Idea: This section is saying that observing, assisting, discussing, reflecting, and planning are key steps for educators to learn how to teach students with disabilities because itโ€™s not just about watching but actively participating and thinking critically about what we see.

    1. Before attempting to speak this language, a learner must acknowledge these spirits with gifts of tobacco and food. Anyone who attempts Ojibwemowin is engaged in something more than learning tongue twisters. However awkward my nouns, unstable my verbs, however stumbling my delivery, to engage in the language is to engage the spirit. Perhaps that is what my teachers know, and what my English will forgive.

      Learning Ojibwe is about honoring spirits and traditions, not just words. This matters because it carries cultural and spiritual responsibility. I think it's powerful that speaking it is an act of respect, not just memorizing vocabulary.

    2. Before attempting to speak this language, a learner must acknowledge these spirits with gifts of tobacco and food. Anyone who attempts Ojibwemowin is engaged in something more than learning tongue twisters. However awkward my nouns, unstable my verbs, however stumbling my delivery, to engage in the language is to engage the spirit. Perhaps that is what my teachers know, and what my English will forgive.

      this connects to the first paragraph saying "my English will forgive" - Paragraph 22, when she first says "My English is jealous" Paragraph 1, I think it's a fantastic way end the text and it wraps up the entire idea of the text. It also shows that Ojibwemowin is more then just a language its a culture, the same way that now most speakers use English and Ojibwemowin when speaking it's evolved over time the same as the people and the traditions with it.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.ย 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.ย 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups.ย 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title).ย 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewerโ€™s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewerโ€™s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores โ‰ฅ4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical โ€˜third regionโ€™ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      โ€œDespite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infantsโ€”even prenatallyโ€”though they remain immature (Huang et al., 2009; Kostoviฤ‡ et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).โ€

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind groupโ€™s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      โ€œThe current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factorsโ€”such as molecular signalsโ€”on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).โ€

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      โ€œThe present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.โ€

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ยฑ 0.056,blind adults: 0.88 ยฑ 0.041, sighted adults: 0.90 ยฑ 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      โ€œSubstantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ยฑ 0.056, blind adults: 0.88 ยฑ 0.041, sighted adults: 0.90 ยฑ 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.โ€

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as โ€˜secondary visual areasโ€™ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term โ€˜reorganizationโ€™ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect โ€˜reorganizationโ€™ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as โ€œvisual experience establishes elements of the sighted-adult long-range connectivityโ€ in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      โ€œEach functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v. ย The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.โ€

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewerโ€™s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewerโ€™s concern and would like to clarify that the term โ€œinstructive effectโ€ is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, โ€œinstructiveโ€ refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      โ€œMany long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostoviฤ‡ et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).โ€ (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      โ€œCompared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms โ€œacrossโ€ and โ€œbetween hemisphere connectivityโ€โ€”they refer to the same concept. To ensure consistency, we have revised the text to exclusively use โ€œbetween hemisphere connectivityโ€ throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readersโ€™ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      โ€œThe average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 โ€“ 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 โ€“ 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 โ€“ 42.71).โ€ (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1โ€“4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohortโ€™s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohortโ€™s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. UX Application: Confirmation Bias

      An example of confirmation bias would be when a player finds a seed with a village and desert temple near spawn, believing it's a "lucky" seed for survival gameplay. They then focus only on seeds with similar features, discarding any without villages or temples, convinced those are automatically worse. This confirmation bias reinforces the idea that "lucky" seeds guarantee a great experience, even though Minecraftโ€™s procedural generation means other seeds can be just as fun, offering different challenges or opportunities. (minecraft)

    1. Author response:

      The following is the authorsโ€™ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the studyโ€™s conclusion.ย 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      โ€œSecond, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participantsโ€™ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.โ€

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences.ย 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.ย ย 

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      โ€œConversely, in the โ€˜elastic controllability modelโ€™, the beta distributions represent a belief about the maximum achievable level of control (๐‘Ž<sub>Control</sub>, ๐‘<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (๐‘Ž<sub>elasticโ‰ฅ1</sub>, ๐‘<sub>elasticโ‰ฅ1</sub>) or specifically two (๐‘Ž<sub>elastic2</sub>, ๐‘<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).โ€

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1).ย 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewerโ€™s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.ย  ย 

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard โ€“ their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do.ย 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship.ย 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participantsโ€™ resource investment choices. A low elasticity bias in a participantโ€™s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (ฮผ = .16 ยฑ .14, where .5 is unbiased).ย 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (ฮ”(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p โ€“ p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participantsโ€™ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies.ย 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.ย ย 

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participantsโ€™ choices worse.ย 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above).ย 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participantsโ€™ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90ยฐ| relative to its location on the previous trial, and an alternate if it moves by >|90ยฐ|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both.ย 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We donโ€™t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a โ€œLimitations and future directionsโ€ section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term โ€œaccuracyโ€ to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to โ€œtask accuracyโ€ throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participantโ€™s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision.ย 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing.ย 

      As explained in our response to Reviewer 1โ€™s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported.ย 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled โ€“ the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript.ย 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decodingย  (g) accuracy, (h) precision, and (h) bias from 50 โ€“ 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ยฑSEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20โ€“22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classificationย  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classificationย  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ยฑSEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180ยฐ to 180ยฐ, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ยฑSEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20โ€“22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020).ย 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022).ย 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior.ย 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects.ย 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition:ย 

      โ€œThe increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.โ€ We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedmanโ€™s ANONA tests, Kendalโ€™s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022).ย 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming.ย 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth.ย 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications.ย 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms.ย 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors):ย 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to โ€œawarenessโ€ without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing.ย 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think itโ€™s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable.ย 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References:ย 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solรฉ, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739.ย 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595.ย 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24.ย 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711.ย 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).ย  Psychologicalย  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68.ย 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. Document de Synthรจse : L'Emprise du Numรฉrique et les Dangers des Rรฉseaux Sociaux

      Introduction : Une Lutte "David contre Goliath"

      Ce briefing expose la problรฉmatique alarmante de l'impact des rรฉseaux sociaux sur la santรฉ mentale et la sรฉcuritรฉ des enfants et adolescents.

      Il met en lumiรจre les tรฉmoignages poignants de victimes et de leurs familles, les actions en justice, le manque de rรฉgulation et les tactiques des gรฉants de la technologie.

      La lutte est prรฉsentรฉe comme un combat "David contre Goliath" entre des familles endeuillรฉes et des entreprises multimillionnaires.

      Thรจmes Principaux et Faits Importants :

      1. Addiction et Impact sur la Santรฉ Mentale des Adolescents :

      Tรฉmoignage d'Alexis Spence : Alexis a dรฉveloppรฉ de l'anorexie, de la dรฉpression et s'est scarifiรฉe ร  partir de 11 ans aprรจs avoir tรฉlรฉchargรฉ Instagram.

      L'algorithme l'a submergรฉe de contenus sur la minceur, puis de photos de personnes anorexiques, de contenus tristes et dรฉprimants.

      Elle dรฉcrit comment elle s'est enfermรฉe dans sa souffrance, devenant "une personne qu'on ne reconnaissait plus".

      Citation : "J'avais 11 ans quand j'ai tรฉlรฉchargรฉ Instagram pour la premiรจre fois et c'est lร  que tout a commencรฉ. [...]

      ร€ force de regarder de la fitness, l'application a commencรฉ ร  me montrer des mannequins. [...] Les mannequins รฉtaient de plus en plus minces jusqu'ร  ce que ce ne soient plus des mannequins mais des personnes anorexiques."

      Citation : "Mon compte est devenu rempli de ces contenus. C'รฉtait des photos tristes en noir et blanc avec des textes dรฉprimants."

      Citation : "Je pense vraiment qu'Instagram a une grande part de responsabilitรฉ dans les problรจmes de santรฉ mentale dont j'ai souffert, surtout si on prend en compte mon Je n'avais que 13 ans."

      Idรฉes Suicidaires et Automutilation : Plusieurs tรฉmoignages de parents รฉvoquent les scarifications et les tentatives de suicide de leurs enfants, directement liรฉes aux contenus diffusรฉs par les algorithmes.

      Citation : "J'ai postรฉ une photo qui disait que j'avais l'intention de me suicider ce soir-lร . [...] J'ai reรงu un appel de l'assistante sociale. Vous devez venir ร  l'รฉcole immรฉdiatement. Votre fille a tentรฉ de se suicider."

      Citation : "On avait mis en place des scarifications un peu contrรดlรฉes. Donc lorsqu'il allait pas bien, il me demandait ses lames. J'attendais derriรจre la porte de sa chambre et voir se scarifier."

      Dรฉni des Plateformes : Les dirigeants des Big Tech ont longtemps niรฉ le lien entre leurs plateformes et les problรจmes de santรฉ mentale.

      Citation dโ€™un sรฉnateur interrogeant Mark Zuckerberg : "everyone knows that kids who spend a lot of time too much time on your platforms are at risk and it's not just the mental health issues. I mean let me ask you a question is your platform safe for kids I believe it is but there's a difference between country if we don't start honest."

      2. Cyberpรฉdocriminalitรฉ et Manque de Sรฉcuritรฉ :

      Prolifรฉration de Contenus Dangereux : Les plateformes sont des vecteurs de cyberpรฉdocriminalitรฉ, avec des prรฉdateurs sexuels qui exploitent les algorithmes et les fonctionnalitรฉs pour cibler les enfants. Interpol Europe est "dรฉbordรฉ par la cyberpรฉdocriminalitรฉ".

      Citation : "on est quand mรชme un moment assez crucial oรน Interpol Europe on est dรฉbordรฉ par la cyberpรฉdocriminalitรฉ et les plateformes elles sont vraiment utilisรฉes par les prรฉdateurs sexuels."

      Citation : "Plus de 80 % des cas de sextorsion, c'est sur Instagram et Snapchat. Urgence ร  ce qu'elles fassent le mรฉnage."

      Algorithmes Complices : Une expรฉrience avec un avatar de 13 ans, "Lili", dรฉmontre que les algorithmes proposent trรจs rapidement des contenus sombres, des scรจnes d'automutilation, du vampirisme, des scรจnes sexualisรฉes, et mรชme l'apologie du suicide, mรชme sans recherche prรฉalable de l'utilisateur.

      Citation : "Sur TikTok, l'algorithme est encore plus rapide. En moins de 5 minutes, la plateforme met en avant des vidรฉos faisant l'apologie du suicide."

      Citation : "En quelques clics, la petite Lili se retrouve tรฉmoin de plusieurs viols sur mineurs."

      Techniques de Manipulation des Prรฉdateurs : Des modes d'emploi pour piรฉger les enfants sont disponibles en ligne. Les prรฉdateurs utilisent des tactiques psychologiques comme le "love bombing" et la sexualisation progressive des conversations, dรฉtournant des codes familiers (personnages de dessins animรฉs) pour normaliser des comportements abusifs.

      Citation : "Ils vont vraiment jouer sur plein de ressorts psychologiques diffรฉrents au niveau des enfants."

      Citation : "Le fait de reprendre des codes par exemple de la Reine des Neiges, enfin des des diffรฉrents personnages comme รงa, il y a il y a des choses qui sont familiรจres qui font pas forcรฉment heurtรฉ comme un cohite frontal de de pornographie."

      Rรฉponse Insuffisante des Plateformes : Malgrรฉ les signalements, les plateformes ne suppriment pas toujours les contenus illicites et les comptes de prรฉdateurs. Leurs efforts de sรฉcuritรฉ sont jugรฉs insuffisants.

      Citation dโ€™un sรฉnateur : "Mr. Zuckerberg, what the hell were you thinking? [...] In what I understand get resources in what saying universe is there a link for se results anyway?" (concernant un message d'avertissement offrant l'option "voir les rรฉsultats quand mรชme" pour des contenus problรฉmatiques).

      Citation dโ€™un reprรฉsentant de lโ€™office de lutte contre la cyberpรฉdocriminalitรฉ : "On a trรจs trรจs peu de signalement qui parviennent par exemple WhatsApp."

      3. Le Rรดle des Entreprises de Technologie et leur Responsabilitรฉ :

      Le "Business Model" des Big Tech : Les documents internes de Meta rรฉvรฉlรฉs par Frances Haugen (une lanceuse d'alerte) montrent que l'entreprise รฉtait consciente des vulnรฉrabilitรฉs des enfants et des impacts nรฉgatifs, mais a privilรฉgiรฉ les profits.

      Citation : "Ces documents montrent que depuis 20 ans mett ร  enquรชte sur les vulnรฉrabilitรฉs des enfants."

      Citation : "Facebook repeatedly encounter conflicts between its own profits and our safety."

      Citation dโ€™un sรฉnateur : "Children are not your priority. Children are your product. Children you see as a way to make money."

      L'Article 230 comme Bouclier : Les entreprises se cachent derriรจre l'article 230 du droit amรฉricain, qui leur confรจre une immunitรฉ en tant qu'hรฉbergeurs de contenu, les protรฉgeant des poursuites judiciaires pour le contenu publiรฉ par leurs utilisateurs.

      Citation : "Ces entreprises se cachent derriรจre l'article 230 qui est vraiment archaรฏque. Ils utilisent cette loi comme bouclier pour dire vous ne pouvez pas nous attaquer."

      Citation dโ€™un sรฉnateur : "It's an astonishing benefit that your industry has that no other industry has. They just don't have to worry about being held in court if they're negligent."

      Lobbying Intense : Pour contrer les projets de loi visant ร  lever leur immunitรฉ et ร  les responsabiliser, les Big Five ont dรฉpensรฉ prรจs de 100 millions de dollars en lobbying, plus de la moitiรฉ provenant du groupe Meta.

      Citation : "Ils ont dรฉpensรฉ prรจs de 100 millions de dollars pour faire renoncer les dรฉputรฉs et les sรฉnateur, plus de la moitiรฉ de cette somme provient du seul groupe mรฉtablill."

      4. Mobilisation Collective et Actions en Justice :

      Mouvement Mondial des Parents : Des parents et des familles du monde entier se mobilisent pour exiger des changements et une meilleure protection des enfants.

      Citation dโ€™un pรจre : "Nous en tant que pรจre Tant que mer nous ne faisons rien, personne ne le fera ร  notre place. C'est notre lutte."

      Citation dโ€™une mรจre : "Nous sommes des milliers de pรจres et de mรจres qui pensons que les smartphones et les rรฉseaux sociaux ne sont pas bons pour nos fils et nos filles."

      Collectif Algos Victima : Fondรฉ par l'avocate Maรฎtre Laure Bouttron Marmion, ce collectif rassemble des familles d'adolescents dont le suicide est liรฉ aux rรฉseaux sociaux, notamment l'affaire de Marie, une jeune fille dรฉcรฉdรฉe en 2021.

      Le collectif vise ร  faire reconnaรฎtre la responsabilitรฉ des entreprises.

      Citation de Maรฎtre Bouttron Marmion : "On souhaite la rรฉgulation cette plateforme qui aujourd'hui est au degrรฉ zรฉro de la rรฉgulation."

      Citation de Maรฎtre Bouttron Marmion : "On ne peut pas ne pas considรฉrer que le rรฉseau social n'a pas sa part de responsabilitรฉ dans le suicide de Marie."

      Actions Judiciaires aux ร‰tats-Unis et en Europe : Plus de 1000 familles et 44 ร‰tats amรฉricains sur 50 poursuivent les gรฉants de la technologie. Des avocats cherchent des bases juridiques solides pour les attaquer.

      Citation dโ€™Alexis : "Depuis, plus de 1000 familles nous ont rejoint et maintenant 44 ร‰tats amรฉricains sur 50 attaquent en justice les grandes entreprises technologiques pour qu'ils soient tenu responsable." Initiatives de Rรฉglementation : Des projets de loi comme le "Kids Online Safety Act", le "EARN IT Act" et le "STOP CSAM Act" visent ร  rendre les entreprises responsables de l'exploitation des enfants et ร  supprimer leur immunitรฉ.

      Citation dโ€™un sรฉnateur : "We have bills that have passed through this incredibly diverse committee when it comes to our political views. Kids online safety act earned act stopam act."

      5. Solutions et Espoirs :

      Interdiction des Smartphones avant un certain รขge : En Espagne, un mouvement de parents a rรฉussi ร  rรฉglementer l'utilisation des tรฉlรฉphones portables dans les collรจges et milite pour une interdiction totale avant 16 ans.

      Citation dโ€™une mรจre : "Nous souhaitons que les smartphones ne puissent pas รชtre utilisรฉs avant 16 ans."

      Citation : "Maintenant, dans les classes et dans la cour, ils ne peuvent plus utiliser leur tรฉlรฉphone portable, sauf si le professeur le demande ร  un moment prรฉcis."

      Dรฉsactivation des Algorithmes pour les Mineurs : Une demande clรฉ est la dรฉsactivation des algorithmes pour les mineurs afin de les protรฉger des contenus inappropriรฉs.

      Citation : "Nous devons veiller ร  ce que l'algorithme soit dรฉsactivรฉ pour les mineurs."

      Espoir dans la Lutte "d'en bas" : L'espoir rรฉside dans la mobilisation des familles et des citoyens face ร  l'inaction des entreprises et des lรฉgislateurs.

      Citation : "J'ai beaucoup plus d'espoir dans les familles, dans la lutte qui vient d'en bas plutรดt que d'en haut."

      L'excuse de Zuckerberg : Lors d'une audition au Sรฉnat, Mark Zuckerberg a รฉtรฉ contraint de s'excuser devant les victimes, bien que ses excuses aient รฉtรฉ perรงues comme insincรจres et non liรฉes ร  la nature de son produit.

      Citation de Mark Zuckerberg : "I'm sorry for everything that you all gone through terrible. No one should have to go through the things that your families have have suffered."

      Citation dโ€™Alexis : "Ses excuses n'รฉtaient pas sincรจres. Il s'est excusรฉ mais il ne s'est pas excusรฉ ร  cause de son produit qu'il appelle lui-mรชme un produit et qui fait du mal."

      Conclusion : Un Monde Post-ร‰cran pour les Enfants ?

      Le briefing souligne que le consensus sur la menace profonde que reprรฉsentent les rรฉseaux sociaux pour la santรฉ mentale et la sรฉcuritรฉ des enfants est dรฉsormais รฉtabli.

      La persรฉvรฉrance des victimes et des familles est cruciale pour obliger les entreprises et les lรฉgislateurs ร  agir, avec l'espoir qu'un jour, "รงa nous semblera tout aussi horrible qu'un enfant possรจde un tรฉlรฉphone portable et soit dรฉconnectรฉ de la vie".

  6. mathieubcd.github.io mathieubcd.github.io
    1. Scenario analysis is amethod in which multiple potential future states (or outcomes) are forecast.It is not constrained by events of the past, which may not capture the impactof changes in the environment; rather it uses both trends (the known) anduncertainties (the unknown) to predict a range of possible future scenarios.

      Healthcare leaders often rely too heavily on past data,ย even though the future rarely unfolds the same way as the past. Scenario analysis encourages organizations to think in terms of possibilities, not certainties, which is especially relevant in healthcare, where conditions can change quickly. For example, we can plan for best-case, worst-case, and most likely outcomes during a pandemic. This improves resource planning and highlights the risks of making decisions based on outdated assumptions. Itโ€™s a reminder that uncertainty should be treated as part of strategy, not just as an obstacle.

    1. Attention: Prior to this page note, "This is a Subscriber Exclusive story.โ€‹" Now it is free to anybody using Hypothesis.

      Hamady grocery store closing its doors on Flint's north end

      Published: Nov. 06, 2018, 11:01 p.m.

      FLINT, MI - Jim McColgan Jr. spent the last two and a half years of his life working to open up Hamady Complete Food Centers on Flint's north side.

      The store near the corner of Clio and Pierson roads in the Hallwood Plaza opened July 25, teeming with nostalgia including the paper sacks Hamady came to be known by many in the community and the promise of 80-plus jobs.

      But the rebirth of the store was short lived.

      McColgan Jr., the store's owner, confirmed the location will close Tuesday, Nov. 6, less than four months after it launched anew.

      "It's just a sad day in my personal life and in also in the life of Flint and the north end community," he said Tuesday evening. "I really wanted to build a beautiful store here. I just wanted it to go that way and everybody to just shop and enjoy themselves and it's just very sad."

      He thanked the city of Flint, Mayor Karen Weaver, and the local chamber of commerce for their "wonderful" support along the way, but McColgan Jr. added, "We just didn't have enough traffic, enough community support."

      The Hamady Bros. supermarket chain started in 1911 with a small store on East Dayton Street and Industrial Avenue in Flint.

      Michael Hamady and his cousin Kamol Hamady co-founded the chain, which grew to 37 stores in the Flint area, employed approximately 1,300 people, and at one point generated $100 million in annual revenue, according to Flint Journal records.

      Alex Dandy took over the business in 1974, and workers took part in a seven-week strike in 1987. Dandy served time in prison for tax evasion and fraud when he took millions from Hamady and another supermarket company.

      James M. McColgan Sr., under a reorganization plan, ran the Hamady's chain in 1988. A year later, the court approved the sale of Hamady to McColgan. When the company's losses exceeded $2 million in 1991, McColgan Sr. decided to sell the chain, according to Journal records. He filed for bankruptcy in May 1991 and last Hamady store closed two months later.

      McColgan Jr. took pride in bringing back the name, which he commented still holds some high esteem in the Flint community.

      "Hamady is a Flint icon. Everybody remembers Hamady. Even younger kids, even younger people," he previously said. "Hamady is still a drawing force in the city of Flint. I am very proud and honored to be a part of the Hamady family and the Hamady name."

      Some delays took place in the opening process, including a stop-work order issued in late March due to Hamady working without proper permits.

      In the face of the closure, McColgan Jr. tried to remain upbeat but added the discussion with employees was a difficult one.

      "It was very tearful. The employees were very emotional," he commented of the situation. "Everybody that worked here put their life into this store. We are all just going to be positive and move forward."

      When asked about potential additional stores as had been discussed for Clio, Durand, Holly, and South Saginaw Street in Flint, McColgan noted: "You never know what the future is going to hold, but I'm young and I'm not ready to retire."

    1. Similarly, the crime genre circulates shifting representations of race. Years ago, thegenre often was racist: black characters, for example, were colorful โ€˜extrasโ€™ or menacingfigures, but, in either case, were portrayed as โ€˜the otherโ€™ and juxtaposed against the usu-ally white protagonist.

      It's true that portraying black characters as criminals was dangerous and hateful, and reinforced harmful stereotypes that still affect real life. Thankfully television has developed since then, and now share stories of black people as complex, normal, and fully human, and illustrate just how these stereotypes have impacted their lives.

    2. in only four episodes during CSIโ€™s first season do the investigatorsfail to apprehend the criminal, and in two of these (Episodes 103 and 109) they have anidea about the criminalsโ€™ identity but not enough information for an arrest.

      It is more realistic for them to not find a criminal than it is for them to solve every single case that they encounter. It's not wrong that they don't solve the case, it just doesn't make for good television.

    1. think it's some special brand of American pathological optimism that so many of us believe the story of Melanie has to turn out to be happy. And that if it doesn't, something unusual has happened-- and not just this is what happens all the time, that the supermarket might be full of Melanies.

      sadly, many stories end poorly and people do not care enough to change it

    1. I pointed out that every use of these systems builds the case for training the next one and building the next data center. And for this interlocutor, that did the trick!

      People are consciously aware of the effects their actions have consequences. Even it's in just a brief moment of laziness, most people have an innate understanding that it will contribute to future issues, including environmental/longterm effects.

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2. ย In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim?ย 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, โ€œGln was selected because its ๐‘ž value is at the middle of the 20 ๐‘ž values.โ€ (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewerโ€™s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out โ€œdrug-interacting residuesโ€.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      โ€œAugur wellโ€ means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the ๐‘ž value of Asp to be the same as that of Glu" โ†’ suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    1. Author response:

      The following is the authorsโ€™ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous โ€œsatellite genomesโ€ that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work โ€“ which is the discovery of โ€œsatellite genomesโ€ which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous โ€œsatellite genomesโ€ that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewerโ€™s earlier comment โ€œShould HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organismโ€.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous โ€œsatellite genomesโ€ that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote โ€œWe chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)โ€.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include โ€œmaterial and methodsโ€, โ€œresultsโ€, โ€œdiscussionโ€, โ€œfiguresโ€ and โ€œlegends to figuresโ€ and โ€œsupplementary figures and tablesโ€ would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the โ€œAbstractโ€ to 250 words to fit in eLifeโ€™s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLifeโ€™s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authorsโ€™ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under โ€œReviewer #1 (Recommendations for the authors, item no. 1-4)โ€.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under โ€œReviewer #1 (Recommendations for the authors, item no. 6)โ€.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term โ€œfunctionโ€ wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewerโ€™s viewpoint. We have replaced the term โ€œpredatory genomeโ€ with a more realistic term โ€œsatellite genomeโ€ in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the โ€œdiscussionโ€ section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term โ€œpredatory genomeโ€ with โ€œsatellite genomeโ€ and revised the โ€œdiscussionโ€ section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include โ€œmaterial and methodsโ€, โ€œresultsโ€, โ€œdiscussionโ€, โ€œfiguresโ€ and โ€œlegends to figuresโ€ and โ€œsupplementary figures and tablesโ€ would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewerโ€™s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on โ€œImmunofluorescence and FISHโ€ (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term โ€œpredatoryโ€ genomes with โ€œsatelliteโ€ genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read โ€œHorizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.โ€ (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word โ€œfunctionโ€ is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term โ€œpredatoryโ€ genomes with a more realistic term viz. โ€œsatelliteโ€ genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewerโ€™s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows โ€œHowever, defining the occurrence of HGT in mammals has been a challengeโ€ (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads โ€œThe results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancerโ€ (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads โ€œThus, โ€œwithin-selfโ€ HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as โ€œforeignโ€ genetic elementsโ€ (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term โ€œpredatoryโ€ genome with โ€œsatelliteโ€ genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewersโ€™ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and itโ€™s โ€œconclusionโ€ section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The โ€œDiscussionโ€ section has been greatly modified and parts of it has been rewritten.

    1. Strive to become bilingual and biliterate in English and another language

      I think that a lot of times, people just want students to be proficient in English, but knowing their first language is just as important. Learning a different language is hard and a lot of adults canโ€™t even do it, so itโ€™s very impressive that a kid is learning two languages at once.

    1. Local school boards protested characterizations of Washington, Jefferson, and Madison as unpatriotic owners of โ€œforced labor camps.โ€

      And yet notes that it was TAUGHT to kids in school history lessons -> seems to me to be cherry picking what is and isn't "history" here.

      Ah, but likens it to Conservatives' view that if THIS is the history being taught, just shouldn't teach history at all -> calls this "zero-sum game of heroes and villains" instead of exploring nuance, etc.

      "It was not an anlysis of people's ideas in their own time, nor a process of change over time." Again -> main issue here is it's link to the present.

    1. A restaurant isnโ€™t just a place where people go to be physically fed. Itโ€™s an emotional and spiritual hub, bringing people into a space where they can connect and find community.

      I agree with this statement that a restaurant isin't just a place to go eat it's a space where you connect with your community, connect with the culture, and just to enjoy the service.

    2. Food brings people together. I think thatโ€™s the source of life.โ€

      Yes food brings people together but I dont think its just the food I think itโ€™s the cultures and traditions that bring communities together and I feel like with that they can relate or can speak to each other with a topic of those traditions.

    3. โ€œneither completely American nor fully Mexican, and really just this hybrid of all these things Iโ€™ve learned about and experienced.โ€

      This also is what I mean by the Restaurant being very welcome and open to anyone, its not just a Mexican Restaurant, it's a place to make you feel comfortable for who you are not just your race. In this article it also talked about how the restaurant gives people food that's off the menu, so If they also serve food that isn't just Mexican upon your request, that would show how much they are welcoming anyone.

    1. For example, in the Logic & Communication column, we see many light-orange cells โ€“ the AI often thought papers were a bit clearer or better argued (by its judgment) than the human evaluators did.

      I wonder if we should normalize this in a few ways, at least as an alternative measure.

      I suspect the AI's distribution of ratings may have different than the human distribution of ratings overall and, the "bias" may also differ by category.

      Actually, that might be something to do first -- compare the distributions of (middle -- later more sophisticated) ratings for humans and for LLMs in an overall sense.

      One possible normalization would be to state these as percentiles relative to the other stated percentiles within that group (humans, LLMs), or even within categories of paper/field/cause area (I suspect there's some major difference between the more applied and niche-EA work and the standard academic work (the latter is also probably concentrated in GH&D and environmental econ). On the other hand, the systematic differences between LLM and human ratings on average might also tell us something interesting. So I wouldn't want to only use normalized measures.

      I think a more sophisticated version of this normalization just becomes a statistical (random effects?) model where you allow components of variation along several margins.

      It's true the ranks thing gets at this issue to some extent, as I guess Spearman also does? But I don't think it fully captures it.

    1. Comparative diversity analysis of the geographic regionsย revealed that theย genetic diversity was highest for cougars sampled in the Northern Rocky Mountains region (HEโ€‰=โ€‰0.58) and lowest for cougars on the Olympic Peninsula (HEโ€‰=โ€‰0.47) (Table 1), but differences between sites were not statistically significant (Kruskal Wallis Test, Hโ€‰=โ€‰2.34, dfโ€‰=โ€‰5, Pโ€‰=โ€‰0.800).

      If the Olympic Peninsula has the lowest genetic diversity but the test says it's not significant, could that just be because of a small sample size or are there any other factors causing this?

    1. โ€œWhen I first started Dencity, if I wasnโ€™t there, people didnโ€™t go out to skate,โ€ said Blessing. โ€œNow, theyโ€™re so comfortable they skate any day. Sometimes Iโ€™m just sitting at home going through Instagram stories then Iโ€™m like, โ€˜Ah! There they are.โ€™โ€

      It's the normal situation that many new communities will experience; people are just like strangers without a leader. But with time passing, people will gradually become true partners for each other.

    2. โ€œWhen I first started Dencity, if I wasnโ€™t there, people didnโ€™t go out to skate,โ€ said Blessing. โ€œNow, theyโ€™re so comfortable they skate any day. Sometimes Iโ€™m just sitting at home going through Instagram stories then Iโ€™m like, โ€˜Ah! There they are.โ€™โ€

      It's the normal situation that many new communities will experience; people are just like strangers without a leader. But with time passing, people will gradually become true partners for each other.

  7. learn-us-east-1-prod-fleet01-beaker-xythos.content.blackboardcdn.com learn-us-east-1-prod-fleet01-beaker-xythos.content.blackboardcdn.com
    1. This initial investment creates new jobs in its own right (both inside thecompany itself, and in the companies which produce capital equipment, rawmaterials, and other inputs). Even more crucially, this initial investment pushesthe โ€œStartโ€ button on the whole process of production. Investment is the mostimportant form of spending required for the successful functioning of capitalism.When investment is strong, capitalism is vibrant and growing. When investment isweak, capitalism stagnates.

      This passage highlights how investment acts as the engine that gets capitalism moving. Not only does it create jobs directly within a company and its suppliers, but it also kickstarts the broader cycle of production. Stanford stresses that investment is the key driver of capitalismโ€™s health when businesses invest, the system grows and thrives, but when they hold back, the economy slows down and stagnates. In other words, investment isnโ€™t just about buying equipment or hiring workers itโ€™s what keeps the whole system alive and moving forward.

    2. Competition โ€“ ruthless, unforgiving, to-the-death competition โ€“ is a crucialfeature of capitalism. It opens up new opportunities for individual firms: they canexpand revenues and profits by winning a larger share of sales from competitors.But competition also poses new challenges, since other companies are trying todo exactly the same thing: namely, grow their own market share at the expense oftheir competitors. Therefore, itโ€™s not just greed that motivates company efforts tominimize costs and maximize profits; with competition, itโ€™s also fear. If a companycanโ€™t stand up to the competition, itโ€™s not just that they wonโ€™t make quite as muchprofit as other companies. Far worse, eventually they will be destroyed by thesecompeting firms producing better products at lower cost.

      This paragraph highlights that competition, not just greed, drives company behavior. Compared to the โ€œlittle circle,โ€ which focuses on a single firm and its workers, this section shows how multiple firms interacting under competitive pressures create fear, innovation, and risk. Both agree firms aim to survive and profit, but competition adds complexity the simpler model doesnโ€™t capture.

    1. For course assessment purposes we use a point system. At the beginning of the semester youstart off with zero points. As you complete more assignments and participate in the course, youearn experience points (referred to as XP). The more XP you earn the higher your letter grade.Your final grade will be based out of 1000 XP as illustrated below

      I really appreciate the XP (experience points) system for grading. It feels motivating. Itโ€™s clear that active participation, especially in discussions and video contributions, is crucial for success. Iโ€™m particularly intrigued by the scenario-based discussions; they seem like a practical way to apply theory to realistic crises. I also like that the syllabus emphasizes timely engagement rather than just completing tasks at the last minute, which seems essential for retaining the material.

    1. This is not always malicious or deliberate. It could be an attempt to draw more clicks with an exciting or dramatic headline, or it could just be a reporter who is not an expert on that topic misunderstanding the research. Because news articles tend to be shorter and written for a general audience, their summaries of research studies will always be simplified.

      I notice this very often when reading articles authors often exaggerate the information they write about. Though it is not always misinformation sometimes itโ€™s just clickbait meant to attract readers, as the text states. I also like that the passage points out that news articles are short and written for a general audience, because this is very true and gets many readers, including me. As in my high school cerea I would always use articles that were very short and articles that had very out there headliners for my sources.

    1. Of course, these statements are by no means the only interpretive claims someone might make about Mustang. There can be many interpretations of any one film, because a film can develop more than one large idea or theme

      I didnโ€™t really realize that a film could have so many different interpretations. I usually just think about the story or the characters, but this makes me see that analyzing a movie is about looking for different themes and ideas. Itโ€™s interesting to think that two people could watch the same film and see completely different meanings.

    1. Two original annotation\ 1.The stacks for the story telling is getting people to the point of the main story itself.

      1. The changes i see in the storytelling is how people perceptively explain themselves when it get to a bad part in the story they stay on that topic

      2. The main theme is about how they feel in the story what type of emoticon triggers that feeling if it's sadness or joyfulness

      3. I find in the story that they change emotion when they feeling sad or scared they go from happy to worry..

      4. in the beginning of the story they get the main purpose of what gonna happened next after the Introduction they can be either scared or worried about the person telling the story

      5. At the end the story teller can be feeling nervous about this whole story or relief they got it out in public they don't have to image the story or tell the story again.

      I would pick " "What is Implicit Bias?" because they talking about Mexican coming from their country to American and board control telling them if they illegal in that country it not okay to say since they're immigrants that came from their hometown just to get peace but not be mock by the person who trying send him back to where eh came from that mean a lot to me and my parents as African myself getting kicked for not fitting in with the Americans. I know what that feel liked

    1. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewersโ€™ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2โ€™s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscriptโ€™s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: โ€œWe employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation โ€“ this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15โ€“17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (โ€œdwMRI Stackedโ€, โ€œrsMRI Stackedโ€, โ€œsMRI Stackedโ€, and โ€œAll MRI Stackedโ€, respectively). Each stacked model was trained using one of four machine learning algorithms โ€“ ElasticNet, Random Forest, XGBoost, or Support Vector Regression โ€“ selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method โ€“ full correlation, partial correlation, or tangent space parametrization โ€“ as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearsonโ€™s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.โ€

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: โ€œAlthough recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.โ€

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: โ€œConsistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r โ‰ˆ 0.4) [15โ€“17, 28, 61, 67, 68].โ€), the medium effect size we observed (r โ‰ˆ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krรคmer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a โ€œmega-analysisโ€. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krรคmer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283โ€“308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating taskโ€based fMRI across tasks markedly boosts prediction and testโ€retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoCโ€™s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: โ€œCognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4โ€“6], and psychotic disorders [7โ€“12]. National Institute of Mental Healthโ€™s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.โ€

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: โ€œTable 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanningโ€

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) โ‰ˆ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) โ‰ˆ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: โ€œTo understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, ageร—sex, and age2ร—sex as an additional set of explanatory variables (Fig. 1).โ€

      Results

      Line 445: โ€œAge and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (โ€œAll MRI Stackedโ€) explained 72% of this age and sex-related variance (Fig. 8iโ€“l and Table S21).โ€

      Discussion

      Line 660: โ€œWe demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.โ€

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: โ€œComposite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (โ€œExtremely unhappyโ€) and the higher score indicated a higher level of happiness (โ€œExtremely happyโ€). For all questions, we assigned the median values to โ€œPrefer not to answerโ€ (-818 for in-person assessment and -3 for online questionnaire) and โ€œDo not knowโ€ (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the โ€œWork/job satisfactionโ€ question from the mental health derivatives list because it included a โ€œNot employedโ€ response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (โ€œFrequency of drinking alcoholโ€), we replaced missing values with 0 as they would correspond to a โ€œNeverโ€ response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of โ‰ฅ 8 and โ‰ฅ 15, respectively. The โ€œAlcohol dependence everโ€ status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].โ€

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank โ€“ development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816โ€“832. ย 

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscriptโ€™s Discussion.

      Results

      Line 298: โ€œOn average, information about mental health predicted the g-factor at ย R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.โ€

      Discussion

      Line 492: โ€œFactor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79โ€“80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81โ€“83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90โ€“93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94โ€“96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97โ€“100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107โ€“109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].โ€

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (โ€œโ€ฆ these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathwaysโ€) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: โ€œThe somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimerโ€™s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117โ€“119], the empirical evidence on their associations with cognitive performance is controversial [114, 120โ€“126].โ€

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimerโ€™s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757โ€“2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. Itโ€™s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance โ€“ as measured by various cognitive tasks โ€“ should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= โ€“0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32]. ย 

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, โ€œwhy is it then necessary to explain this overlap with neuroimaging measuresโ€, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association โ€“ insights that cannot be obtained from behavioural data alone.

      In response to the related question, โ€œCould the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?โ€, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction โ€“ to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: โ€œOur analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63โ€“68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples โ€“ a promising direction for future research.โ€

      Line 658: โ€œAlthough recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.โ€

      Introduction

      Line 43: โ€œCognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4โ€“6], and psychotic disorders [7โ€“12]. National Institute of Mental Healthโ€™s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.โ€

      Discussion

      Line 481: โ€œOur analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63โ€“68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples โ€“ a promising direction for future research.โ€

      Line 658: โ€œAlthough recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.โ€

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748โ€“751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190โ€“214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (โ€œmarkerโ€) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator โ€“ its capacity to capture individual differences in cognition โ€“ is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: โ€œOur analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63โ€“68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples โ€“ a promising direction for future research.โ€

      Line 658: โ€œAlthough recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.โ€

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654โ€“660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748โ€“751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: โ€œWe employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation โ€“ this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15โ€“17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (โ€œdwMRI Stackedโ€, โ€œrsMRI Stackedโ€, โ€œsMRI Stackedโ€, and โ€œAll MRI Stackedโ€, respectively). Each stacked model was trained using one of four machine learning algorithms โ€“ ElasticNet, Random Forest, XGBoost, or Support Vector Regression โ€“ selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method โ€“ full correlation, partial correlation, or tangent space parametrization โ€“ as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearsonโ€™s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.โ€

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the โ€œProspective Memoryโ€ test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures โ€“ numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the โ€œCore measuresโ€ column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: โ€œWe used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].โ€

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correctโ€ฆ It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, Rยฒ \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: โ€œMental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSRโ€™s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variablesโ€

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: โ€œFactor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79โ€“80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81โ€“83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90โ€“93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94โ€“96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97โ€“100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107โ€“109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].โ€

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8โ€“23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164โ€“177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smallerโ€ฆ What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain โ€“ mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.ย  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: โ€œOur analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63โ€“68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples โ€“ a promising direction for future research.โ€

      Line 658: โ€œAlthough recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap โ€“ insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.โ€

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearsonโ€™s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1)ย Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3)ย Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: โ€œTo evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearsonโ€™s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.โ€

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r โ‰ˆ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520โ€“5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: โ€œWe employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation โ€“ this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.โ€

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2โ€“S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: โ€œTo determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, ageร—sex, and age<sup>2</sup>ร—sex as an additional set of explanatory variables (Fig. 1).โ€

      Results

      dwMRI

      Line 331: โ€œOverall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).โ€

      rsMRI

      Line 353: โ€œAmong RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)โ€

      sMRI

      Line 373: โ€œFreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).โ€

      Discussion

      dwMRI

      Line 562: โ€œAmong dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.โ€

      rsMRI

      Line 583: โ€œWe extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127โ€“130].โ€

      sMRI

      Line 608: โ€œIntegrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138โ€“140].โ€

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. In my beginnerโ€™s guide to Roam, I completely left out the Daily Notes section to keep things simple. Letโ€™s now have a look together. This is what a daily note with interstitial journaling looks like. <img fetchpriority="high" decoding="async" width="674" height="717" src="https://viahtml.hypothes.is/proxy/im_/https://nesslabs.com/wp-content/uploads/2020/04/interstitial-journaling-example.png" alt="" class="wp-image-8473" srcset="https://viahtml.hypothes.is/proxy/https://nesslabs.com/wp-content/uploads/2020/04/interstitial-journaling-example.png 674w, https://viahtml.hypothes.is/proxy/https://nesslabs.com/wp-content/uploads/2020/04/interstitial-journaling-example-282x300.png 282w" sizes="(max-width: 674px) 100vw, 674px"/> Track time. Type /time to insert the current time, then type whatever you are thinking about.Track tasks. Type /todo to create to-do items. Check off these to-do items when done.Track content. When you stumble upon something interesting that would disturb your workflow, add it to master lists such as [[To read]]. You can see I have done it in this screenshot with an interesting-looking article that had nothing to do with the essay I was trying to write.Track ideas. Similarly, if you think of something else youโ€™d like to do today, just add it as a to-do where and when you think about it. For people using the [[Today]], [[Tomorrow]], [[Someday]] system, you can also add that to the to-do items, or add a specific date, as I have done with โ€œcall Morgane.โ€Track well-being. I like to start my work day with a quick note checking in on how I feel, anything thatโ€™s been sometimes literally keeping me up at night, any major roadblock Iโ€™m anticipating for the day. Itโ€™s rarely longer than one bullet point, but itโ€™s a great way to take care of my general well-being. I also finish the work day with a similar quick closing note.

      Me parece curiosa esta parte y es que muestra cรณmo el diario intersticial convierte hasta lo mรกs simple como es la hora, una tarea o una idea en algo organizado. Sin imaginar, terminas con un registro completo de tu dรญa sin tener que hacer demasiado y eso es lo que buscamos la mayorรญa algo practico.

    1. white-upturned

      "Unto the white-upturned wondering eyes / Of mortals that fall back to gaze on him." The dictionary defines "upturned" as an adjective meaning "turned upwards," Its etymology formed within English by compounding "up" and "turned," and its earliest recorded use attributed to Shakespeare himself in 1597. It's about people looking up in awe, showing their eyes white part. This made Romeo's speech feel super dreamy, like Juliet's an angel. It helped me understand why the balcony scene is so special it's not just love talk but almost magical for Romeo.

    1. โ€œThe beerโ€™s nice and cool,โ€ the man said. โ€œItโ€™s lovely,โ€ the girl said. โ€œItโ€™s really an awfully simple operation, Jig,โ€ the man said. โ€œItโ€™s not really an operation at all.โ€ The girl looked at the ground the table legs rested on. โ€œI know you wouldnโ€™t mind it, Jig. Itโ€™s really not anything. Itโ€™s just to let the air in.โ€ The girl did not say anything. โ€œIโ€™ll go with you and Iโ€™ll stay with you all the time. They just let the air in and then itโ€™s all perfectly natural.โ€ โ€œThen what will we do afterward?โ€ โ€œWeโ€™ll be fine afterward. Just like we were before.โ€ โ€œWhat makes you think so?โ€ โ€œThatโ€™s the only thing that bothers us. Itโ€™s the only thing thatโ€™s made us unhappy.โ€
      1. This is when I believe it went from calm to the conversation starting to get on a deeper level.
    1. Another organization attempted to validateits hypothesized cause-and-effect relation-ships in the balanced scorecard by measuringthe strength of the linkages among measuresin the different perspectives.

      This example shows how a company used the Balanced Scorecard to test whether its strategy really worked as predicted. By tracking data across perspectivesโ€”like employee morale, customer satisfaction, and operational efficiency, it found clear chains of impact: happier employees led to happier customers, which sped up payments and boosted returns. Such evidence strengthens confidence in the strategy. The key insight is that the scorecard isnโ€™t just about measuring results; itโ€™s about proving or questioning the logic behind them. If those links fail to appear over time, leaders know itโ€™s time to challenge their assumptions and possibly rethink the strategy itself, making learning and adaptation part of everyday management.

    2. The problem is that most organizationshave separate procedures and organizationalunits for strategic planning and for resourceallocation and budgeting. To formulate theirstrategic plans, senior executives go off-siteannually and engage for several days inactive discussions facilitated by senior plan-ning and development managers or externalconsultants. The outcome of this exercise is astrategic plan articulating where the com-pany expects (or hopes or prays) to be inthree, five, and ten years. Typically, suchplans then sit on executivesโ€™ bookshelves forthe next 12 months.

      This part shows how many companies create long, impressive strategic plans that end up collecting dust, while budgets and resources are set in a completely separate process. The Balanced Scorecard closes that gap by linking strategy to budgets and daily operations, so money, time, and effort go to initiatives that truly drive long-term goals. What stands out is the realism of the critique: itโ€™s common for plans to sound great at annual retreats but never shape day-to-day decisions. By forcing strategy and budgeting into one process, the scorecard makes strategy real, ensuring that ambitious objectives donโ€™t just stay on paper but guide actual spending, staffing, and priorities over the year.

    1. Author Response :

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses

      (1) The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Ammonia can come from a variety of sources both within and outside the cells and this can be from dead cells also. Ammonia by increasing cAMP levels, trigger collective cell movement thereby establishing a tip in Dictyostelium. A gaseous signal can act over long distances in a short time and for instance ammonia promotes synchronous development in a colony of yeast cells (Palkova et al., 1997; Palkova and Forstova, 2000). The slug tip is known to release ammonia probably favouring synchronized development of the entire colony of Dictyostelium. However, after the tips are established ammonia exerts negative chemotaxis probably helping the slugs to move away from each other ensuring equal spacing of the fruiting bodies (Feit and Sollitto, 1987).

      It is well known that ammonia serves as a signalling molecule influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). Ammonia by raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, is known to increase the speed of chemotaxing amoebae (Siegert and Weijer, 1989; Van Duijn and Inouye, 1991), inducing collective cell movement (Bonner et al., 1988, 1989), favoring tipped mound development.

      Ammonia produced in millimolar concentrations during tip formation (Schindler and Sussman, 1977) could ward off other predators in soil. For instance, ammonia released by Streptomyces symbionts of leaf-cutting ants is known to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled back into amino acids, as observed during breast cancer proliferation (Spinelli et al., 2017). Such a process may also occur in starving Dictyostelium cells, supporting survival and differentiation. These findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development.

      (2)ย The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia reinforces or maintains the positional information by elevating cAMP levels, favoring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). In adgf mutants that have low ammonia levels, both neutral red staining (a marker for prestalk and ALCs) (Figure. S3) and the prestalk marker ecmA/ ecmB expression (Figure. 7D) are higher than the WT and the mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.

      Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement during tip formation (Bonner et al., 1989).

      (3)ย By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      Exposure of adgf mounds to ammonia, led to tip development within 4 h (Figure. 5). In contrast, adgf controls remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not the trigger for tip development and ammonia promotes the transition from mound to tipped mound formation.

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Further, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Figure. S4A), and they continue to stay as mounds.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Figure. 2H. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) and suggests that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines: 47,48 - "The gradient of these morphogens along the slug axis determines the cell fate, either as prestalk (pst) or as prespore (psp) cells." - many workers have shown that this is not true - intrinsic factors such as cell cycle phase drive cell fate.

      Thank you for pointing this out. We have removed the line and rephrased as โ€œBased on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate as prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011).

      (2) Line 48 - PKA - please explain acronyms at first use.

      Corrected

      (3) Line 56 - The relationship between adenosine deaminase and ADGF is a bit unclear, please clarify this more.

      Adenosine deaminase (ADA) is intracellular, whereas adenosine deaminase related growth factor (ADGF) is an extracellular ADA and has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008).

      (4)ย Figure 1 - where are these primers, and the bsr cassette, located with respect to the coding region start and stop sites?

      The primer sequences are mentioned in the supplementary table S2. The figure legend is updated to provide a detailed description.

      (5)ย Line 104 - 37.47% may be too many significant figures.

      Corrected

      (6)ย Line 123 - 1.003 ร… may be too many significant figures.

      Corrected

      (7) Line 128 - Since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected

      (8)ย Figure 3G - did the DCF also increase mound size? It sort of looks like it did.

      Yes, the addition of DCF increases the mound size (now Figure. 2G).

      (9)ย Figure 3I - the spore mass shown here for ADGF - looks like there are 3 stalks protruding from it; this can happen if a plate is handled roughly and the spore masses bang into each other and then merge

      Thank you for pointing this out. The figure 3I (now Figure. 2I) is replaced.

      (10)ย Lines 160-162 - since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected.

      (11) Line 165 - ' ... that are involved in adenosine formation' needs a reference.

      Reference is included.

      (12)ย Line 205 - 'Addition of ADA to the CM of the mutant in one compartment.' - might clarify that the mutant is the ADGF mutant

      Yes, revised to 'Addition of ADA to the CM of the adgf mutant in one compartment.

      (13 Lines 222-223 need a reference for caffeine acting as an adenosine antagonist.

      Reference is included.

      (14)ย Figure 8B - left - use a 0-4 or so scale so the bars are more visible.

      Thank you for the suggestion. The scale of the y-axis is adjusted to 0-4 in Figure. 7B to enhance the visibility of the bars.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      (1)ย Weaknesses: Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      adgf expression was examined at 0, 8, 12, and 16 h (Figure. 1), and the total ADA activity was assayed at 12 and 16 h (Figure. 3). Previously, the 12 h data was not included, and itโ€™s been added now (Figure. 3A). The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. Since the ADA assay will also report the activity of other three isoforms, it will not exclusively reflect ADGF activity.

      Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) suggesting that WT adgf favours prespore differentiation. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (2)ย The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The adgf mutant in comparison to WT has diminished acaA expression (Fig. 6B) and reduced cAMP levels (Fig. 6A) both at 12 and 16 h of development. The cAMP levels were measured at 8 h and 12 h in the mutant.

      We would like to add that ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) in Dictyostelium. Exposure to ammonia increases acaA expression in WT (Figure. 7B) and is likely to increase acaA expression/ cAMP levels in the mutant also (Riley and Barclay, 1990; Feit et al., 2001) thereby rescuing the defects in cAMP signalling. Based on the comments, cAMP levels will also be measured in the mutant after the rescue with ammonia.

      (3)ย No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      cAMP levels will be quantified in the dhkD mutant after treatment with ammonia. The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation.

      Reviewer #2 (Recommendations for the authors):

      The paper describes new insights into the role of ADGF, an enzyme that catalyses the breakdown of adenosine in ammonia and inosine, in tip formation in Dictyostelium development.

      A knockout of the gene results in a tipless mound stage arrest and the mounds formed are somewhat larger in size. Synergy experiments show that the effect of the mutation is non-cell autonomous and further experiments show that the mound arrest phenotype can be rescued by the provision of ammonia vapour. These observations are well documented. Furthermore, the paper contains a wide variety of experiments attempting to place the observed effects in known signalling pathways. It is suggested that ADGF may function downstream of DhkD, a histidine kinase previously implicated in ammonia signalling. Ammonia has long been described to affect different aspects, including differentiation of slug and culmination stages of Dictyostelium development, possibly through modulating cAMP signalling, but the exact mechanisms of action have not yet been resolved. The experiments reported here to resolve the mechanistic basis of the mutant phenotype need focusing and further work.

      (1)ย The paper needs streamlining and editing to concentrate on the main findings and implications.

      The manuscript will be revised extensively.

      Below is a list of some more specific comments and suggestions.

      (2) Introduction: Focus on what is relevant to understanding tip formation and the role of nucleotide metabolism and ammonia (see https://doi.org/10.1016/j.gde.2016.05.014).leading). This could lead to the rationale for investigating ADGF.

      The manuscript will be revised extensively

      (3) Lines 36-38 are not relevant. Lines 55-63 need shortening and to focus on ADGF, cellular localization, and substrate specificity.

      The manuscript will be revised accordingly. Lines 36-38 will be removed, and the lines 55-63 will be shortened.

      In humans, two isoforms of ADA are known including ADA1 and ADA2, and the Dictyostelium homolog of ADA2 is adenosine deaminase-related growth factor (ADGF). Unlike ADA that is intracellular, ADGF is extracellular and also has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008). Loss-of-function mutations in ada2 are linked to lymphopenia, severe combined immunodeficiency (SCID) (Gaspar, 2010), and vascular inflammation due to accumulation of toxic metabolites like dATP (Notarangelo, 2016; Zhou et al., 2014).

      (4)ย Results: This section would benefit from better streamlining by a separation of results that provide more mechanistic insight from more peripheral observations.

      The manuscript will be revised and the peripheral observations (Figure. 2) will be shifted to the supplementary information.

      (5)ย Line 84 needs to start with a description of the goal, to produce a knockout.

      Details on the knockout will be elaborated in the revised manuscript. Line number 84 (now 75). Dictyostelium cell lines carrying mutations in the gene adgf were obtained from the genome wide Dictyostelium insertion (GWDI) bank and were subjected to further analysis to know the role of adgf during Dictyostelium development.

      (6) Knockout data (Figure 1) can be simplified and combined with a description of the expression profile and phenotype Figure 3 F, G, and Figure 5. Higher magnification and better resolution photographs of the mutants would be desirable.

      Thank you, as suggested the data will be simplified (section E will be removed) and combined with a description of the expression profile and, the phenotype images of Figure 3 F, G, and Figure 5 ( now Figure. 2 F, G, and Figure. 4) will be replaced with better images/ resolution.

      (7) It would also be relevant to know which cells actually express ADGF during development, using in-situ hybridisation or promoter-reporter constructs.

      To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (8) Figure 2 - Information is less directly relevant to the topic of the paper and can be omitted (or possibly in Supplementary Materials).

      Figure. 2 will be moved to supplementary materials.

      (9) Figures 4A, B - It is shown that as could be expected ada activity is somewhat reduced and adenosine levels are slightly elevated. However, the fact that ada levels are low at 16hrs could just imply that differentiation of the ADGF- cells is blocked/delayed at an earlier time point. To interpret these data, it would be necessary to see an ada activity and adenosine time course comparison of wt and mutant, or to see that expression is regulated in a celltype specific manner that could explain this (see above). It would be good to combine this with the observation that ammonia levels are lower in the ADGF- mutant than wildtype and that the mutant phenotype, mound arrest can be rescued by an external supply of ammonia (Figure 6).

      In Dictyostelium four isoforms of ADA including ADGF are present, and thus the time course of total ADA activity will also report the function of other isoforms. Further, a number of pathways, generate adenosine (Dunwiddie et al., 1997; Boison and Yegutkin, 2019). ADGF expression was examined at 0, 8, 12 and 16 h (Fig 1) and the ADA activity was assayed at 12 h, the time point where the expression gradually increases and reaches a peak at 16 h. Earlier, we have not shown the 12 h activity data which will be included in the revised version. ADGF expression was found to be highly elevated at 16 h and adenosine/ammonia levels were measured at the two points indicated in the mutant.

      (10) Panel 4C could be combined with other measurements trying to arrive at more insight in the mechanisms by which ammonia controls tip formation.

      Panel 4C (now 3C) illustrates the genes involved in the conversion of cAMP to adenosine. Since Figure. 3 focuses on adenosine levels and ADA activity in both WT and adgf mutants, we have retained Panel 3C in Figure. 3, for its relevance to the experiment.

      (11) There is a large variety of experiments attempting to link the mutant phenotype and its rescue by ammonia to cAMP signalling, however, the data do not yet provide a clear answer.

      It is well known that ammonia increases cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) and adenylate cyclase activity (Cotter et al., 1999) in D. discoideum, and exposure to ammonia increases acaA expression (Fig 7B) suggesting that ammonia regulates cAMP signaling. To address the concerns, cAMP levels will be quantified in the mutant after ammonia treatment.

      (12) The mutant is shown to have lower cAMP levels at the mound stage which ties in with low levels of acaA expression (Figures 7A and B), also various phosphodiesterases, the extracellular phosphodiesterase pdsa and the intracellular phosphodiesterase regA show increased expression. Suggesting a functional role for cAMP signalling is that the addition of di cGMP, a known activator of acaA, can also rescue the mound phenotype (Figure 7E). There appears to be a partial rescue of the mound arrest phenotype level by the addition of 8Br-cAMP (fig 7C), suggesting that intracellular cAMP levels rather than extracellular cAMP signalling can rescue some of the defects in the ADGF- mutant. Better images and a time course would be helpful.

      The relevant images will be replaced and a developmental time course after 8-Br-cAMP treatment will be included in the revised manuscript (Figure. 6D).

      (13) There is also the somewhat surprising observation that low levels of caffeine, an inhibitor of acaA activation also rescues the phenotype (Figure 7F).

      With respect to caffeine action on cAMP levels, the reports are contradictory. Caffeine has been reported to increase adenylate cyclase expression thereby increasing cAMP levels (Hagmann, 1986) whereas Alvarez-Curto et al., (2007) found that caffeine reduced intracellular cAMP levels in Dictyostelium. Caffeine, although is a known inhibitor of ACA, is also known to inhibit PDEs (Nehlig et al., 1992; Rosenfeld et al., 2014). Therefore, if caffeine differentially affects ADA and PDE activity, it may potentially counterbalance the effects and rescue the phenotype.

      (14) The data attempting to asses cAMP wave propagation in mounds (Fig 7H) are of low quality and inconclusive in the absence of further analysis. It remains unresolved how this links to the rescue of the ADGF- phenotype by ammonia. There are no experiments that measure any of the effects in the mutant stimulated with ammonia or di-cGMP.

      The relevant images will be replaced (now Figure. 6H). Ammonia by increasing acaA expression (Figure. 7B), and cAMP levels (Figure. 7C) may restore spiral wave propagation, thereby rescuing the mutant.

      (15) A possible way forward could also come from the observation that ammonia can rescue the wobbling mound arrest phenotype from the histidine kinase mutant dhkD null mutant, which has regA as its direct target, linking ammonia and cAMP signalling. This is in line with other work that had suggested that another histidine kinase, dhkC transduces an ammonia signal sensor to regA activation. A dhkC null mutant was reported to have a rapid development phenotype and skip slug migration (Dev. Biol. (1998) 203, 345). There is no direct evidence to show that dhkD acts upstream of ADGF and changes in cAMP signalling, for instance, measurements of changes in ADA activity in the mutant.

      cAMP levels will be quantified in the dhkD mutant after ammonia treatment and accordingly, the results will be revised.

      (16) The paper makes several further observations on the mutant. After 16 hrs of development the adgf- mutant shows increased expression of the prestalk cell markers ecmA and ecmB and reduced expression of the prespore marker pspA. In synergy experiments with a majority of wildtype, these cells will sort to the tip of the forming slug, showing that the differentiation defect is cell autonomous (Fig 9). This is interesting but needs further work to obtain more mechanistic insight into why a mutant with a strong tip/stalk differentiation tendency fails to make a tip. Here again, knowing which cells express ADGF would be helpful.

      The adgf mutant shows increased prestalk marker expression in the mound but do not form a tip. It is well known that several mound arrest mutants form differentiated cells but are blocked in development with no tips (Carrin et al., 1994). This is addressed in the discussions (539). To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (17) The observed large mound phenotype could as suggested possibly be explained by the low ctn, smlA, and high cadA and csA expression observed in the mutant (Figure 3). The expression of some of these genes (csA) is known to require extracellular cAMP signalling. The reported low level of acaA expression and high level of pdsA expression could suggest low levels of cAMP signalling, but there are no actual measurements of the dynamics of cAMP signalling in this mutant to confirm this.

      The acaA expression was examined at 8 and 12 h (Figure. 6B) and cAMP levels were measured at 12 and 16 h in the adgf mutants (Figure. 6A). Both acaA expression and cAMP levels were reduced, suggesting that cells expressing adgf regulate acaA expression and cAMP levels. This regulation, in turn, is likely to influence cAMP signaling, collective cell movement within mounds, ultimately driving tip development. Exposure to ammonia led to increased acaA expression (Figure. 7B) in in WT. Based on the comments above, cAMP levels will be measured in the mutant before and after rescue with ammonia.

      (18) Furthermore, it would be useful to quantify whether ammonia addition to the mutant reverses mound size and restores any of the gene expression defects observed.

      Ammonia treatment soon after plating or six hours after plating, had no effect on the mound size (Figure. 5G).

      (19) There are many experimental data in the supplementary data that appear less relevant and could be omitted Figure S1, S3, S4, S7, S8, S9, S10.

      Figure S8, S9, S10 are omitted. We would like to retain the other figures

      Figure S1 (now Figure. S2): It is widely believed that ammonia comes from protein (White and Sussman, 1961; Hames and Ashworth, 1974; Schindler and Sussman, 1977) and RNA (Walsh and Wright, 1978) catabolism. Figure. S2 shows no significant difference in protein and RNA levels between WT and adgf mutant strains, suggesting that adenosine deaminaserelated growth factor (ADGF) activity serves as a major source of ammonia and plays a crucial role in tip organizer development in Dictyostelium. Thus, it is important to retain this figure.

      Figure S3 (now Figure. S4): The figure shows the treatment of various mound arrest mutants and multiple tip mutants with ADA enzyme and DCF, respectively, to investigate the pathway through which adgf functions. Additionally, it includes the rescue of the histidine kinase mutant dhkD with ammonia, indicating that dhkD acts upstream of adgf via ammonia signalling. Therefore, it is important to retain this figure.

      Figure S4 (now Figure. S5): This figure represents the developmental phenotype of other deaminase mutants. Unlike adgf mutants, mutations in other deaminases do not result in complete mound arrest, despite some of these genes exhibiting strong expression during development. This underscores the critical role of adenosine deamination in tip formation. Therefore, let this figure be retained.

      Figure S7 (now Figure. S8): Figure S8 presents the transcriptomic profile of ADGF during gastrulation and pre-gastrulation stages across different organisms, indicating that ADA/ADGF is consistently expressed during gastrulation in several vertebrates (Pijuan-Sala et al., 2019; Tyser et al., 2021). Notably, the process of gastrulation in higher organisms shares remarkable similarities with collective cell movement within the Dictyostelium mound (Weijer, 2009), suggesting a previously overlooked role of ammonia in organizer development. This implies that ADA may play a fundamental role in regulating morphogenesis across species, including Dictyostelium and vertebrates. Therefore, we would like to retain this figure.

      (20) Given the current state of knowledge, speculation about the possible role of ADGF in organiser function in amniotes seems far-fetched. It is worth noting that the streak is not equivalent to the organiser. The discussion would benefit from limiting itself to the key results and implications.

      The discussion is revised accordingly by removing the speculative role of ADGF in organizer function in amniotes. The lines โ€œIt is likely that ADA plays a conserved, fundamental role in regulating morphogenesis in Dictyostelium and other organisms including vertebratesโ€ have been removed.

    1. it just a matter of time before computers take over the world? Itโ€™s not hard to envision a dystopian future where robots roam the earth and outsmart human beings (think of movies like 2001: A Space Odyssey, The Matrix, or The Terminator series). Indeed, the physicist Stephen Hawking warned that, โ€œThe development of full artificial intelligence could spell the end of the human raceโ€

      This part makes me think about how movies shape our fears of AI. It also connects to the Barnum Effect video since both show how easily people can believe dramatic ideas, even if theyโ€™re not fully based on reality.

    2. horoscopes that give generic feedback, which could apply to just about anyone.

      the horoscope/personality test effect mirrors universality of expression: we map vague, generic input onto our own representative cases. the trick is that representativeness feels diagnostic even when itโ€™s just broad enough to fit anyone.

    3. One version said, โ€œPeople who know him consider him to be a very warm person, industrious, critical, practical, and determined.โ€ The other version was identical except that the phrase โ€œa very warm personโ€ was replaced with โ€œa rather cold person.โ€ The students received one of these personality descriptions at random.

      schemas arenโ€™t just useful. They can actually override direct evidence. Mehrabian & Ferris (1967) found that when faces and voices conflict, people weight the face 1.5ร— more. itโ€™s the same logic as Kelleyโ€™s warm/cold professor study: a single cue can set the frame for interpreting everything else.

    1. One underlying issue is that videoconferencing is notvirtual reality. It is interactive but not immersive, and there is no com-mon virtual world

      I definitely agree. I live on the other side of the world from my family, so we use video calls all the time. Itโ€™s great to hear their voices and see their faces, but it still does not feel real. Thereโ€™s no real sense of being together, itโ€™s just a screen. No matter how clear the video is, it doesnโ€™t feel like weโ€™re actually together. Itโ€™s more like watching each other than being with each other. Video calls are interactive, but theyโ€™re not immersive, and that makes a big difference.

    2. When the next pan-demic arrives in a decade or two, itโ€™s likely that many people will hangout in immersive virtual worlds designed for social interaction.

      My question here is why is virtual reality interaction so appealing to us? Sure using it in a pandemic sense could be nice, but think of the downsides. People could start interacting over virtual reality and never go back to in person conversation, just like a lot of people continued to work at home after COVID. We could lose the genuineness behind a conversation if we're only looking at a virtual version of someone.

    3. The third question, raised by Platoโ€™s cave, concerns value. Iโ€™ll call itthe Value Question. Can you lead a good life in a virtual world?

      Chalmers' main claim in this book is that virtual realities are real, and that they are just as real as life right now, and there is no way of knowing that the life we are living right now is not a simulation. When I ask myself โ€œCan you lead a good life in a virtual world?โ€ I automatically think no. To me, in a virtual world, there is no real good you can do. If you help other people in the virtual world, you are only helping what I would think to be pixels. Anything you do in the virtual reality exists only in the simulation, but if you were to step out of it, anything you did is lost. It reminds me of having a high score in a game, but if you deleted the game, the high score goes away and it's like you never achieved the high score.

    1. Author response:

      The following is the authorsโ€™ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades.ย However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1).ย  Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra.ย  While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kรถlliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs.ย Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward.ย  Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session.ย  Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate.ย  Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In ย , however, we added a caveat to the paperโ€™s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dyeโ€™s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades.ย  In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed โ€œrepresentativeโ€ from this revision.ย  We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled.ย  However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding.ย  We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group.ย  In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis.ย  In responseย to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally.ย  The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)!ย  In response to the reviewerโ€™s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439).ย  We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications.ย  No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and รธrvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      Weโ€™ve added older REFs as pointed out above.ย  Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question.ย  But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for than you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly

      Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press,Oxford.

      Indeed, we have read all of Masonโ€™s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7.ย  Thanks for the de Beer REFs.ย  While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections with in the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness.ย  For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results.ย  We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place!ย  We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species.ย  For the Fig7 branching and catshark inclusion, please see above.ย 

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends.ย  We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscriptโ€™s abstract, figures, and figure legends.ย  That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively.ย  For example, we started with โ€œSpecific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolutionโ€.ย  We also stated an objective for the experiments presented in the paper: โ€œTo clarify the distribution of specific endoskeletal features among extant chondrichthyansโ€.ย 

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue.ย  We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).ย 

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      no change; see above

      L53: down tune languish, remove "severely" and "major"

      done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      no change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      no change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      changed to โ€œneural arches and centra of segmented vertebraeโ€ (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      thanks for pointing out this issue; we changed wording, instead contrasting โ€œnon-continuousโ€ and โ€œcontinuousโ€ mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      all regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      references to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred.ย  Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      โ€œselected representativeโ€ removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to โ€œStem chondrichthyans did not appear to mineralize their centraโ€ (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      no change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      sentence re-worded and โ€œwe proposeโ€ removed (lines 412-415)

      L420: remove paragraph

      no action; see above

      L436: remove paragraph

      no action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      yes, in responseย to the reviewerโ€™s comment, we finished the discussion with a summary of the current study.ย  (lines 440-453)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary: The authors have previously published Mass-spectrometry data that demonstrates a physical interaction between Sall4 and the BAF chromatin complex in iPSC derived neurectodermal cells that are a precursor cell state to neural crest cells. The authors sought to understand the basis of this interaction and investigate the role of Sall4 and the BAF chromatin remodelling complex during neural crest cell specification. The authors first validate this interaction with a co-IP between ARID1B subunit and Sall4 confirming the mass spec data. The authors then utilise in silico modelling to identify the specific interaction between the BAF complex and Sall4, suggesting that this contact is mediated through the BAF complex member DPF2. To functionally validate the role of Sall4 during neural crest specification, the authors utilsie CRISPR-Cas9 to introduce a premature stop codon on one allele of Sall4 to generate iPSCs that are haploinsufficient for Sall4. Due to the reports of Sall4's role in pluripotency, the authors confirm that this model doesn't disrupt pluripotent stem cells and is viable to model the role of Sall4 during neural crest induction. The authors expand this assessment of Sall4 function further during their differentiation model to cranial neural crest cells, assessing Sall4 binding with Cut+Run sequencing, revealing that Sall4 binds to motifs that correspond to key genes in neural crest differentiation. Moreover, reduction in Sall4 expression also reduces the binding of the BAF complex, through Cut and Run for BRG1. Overall, the authors then propose a model by which Sall4 and BRG1 bind to and open enhancer regions in neurectodermal cells that enable complete differentiation to cranial neural crest cells.

      Overall, the data is clear and reproducible and offers a unique insight into the role of chromatin remodellers during cell fate specification.

      We thank the Reviewer for the nice words of appreciation of our manuscript.

      However, I have some minor comments.

      1- Using AlphaFold in silico modelling, he authors propose the interaction between the BAF complex with Sall4 is mediated by DPF2, but don't test it. Does a knockout, or knockdown of DPF2 prevent the interaction?

      We agree with the Reviewer that we are not functionally validating our computational prediction that DPF2 is the specific BAF subunit directly linking SALL4 with BAF. We chose not to perform the validation experiment for two main reasons:

      1) This would be outside of the scope of the paper. In fact, from a mechanistic point of view, we have confirmed via both Mass-spectrometry and co-IP with ARID1B that SALL4 and BAF interact in our system. Moreover, mechanistically we also extensively demonstrate that the interaction with SALL4 is required to recruit BAF at the neural crest induction enhancers and we further demonstrate that depletion of SALL4 impairs this. In our view, this was the focus of the manuscript. On the other hand, detecting with certainty which BAF subunit mediates the interaction with SALL4 would be outside the scope of the paper.

      2) Moreover, after careful consideration, we donโ€™t think that even a knock-out of DPF2 would provide a definite answer to which exact BAF subunit mediates the interaction with SALL4. In fact, knock out of DPF2 could potentially disrupt BAF assembly or stability, and this could result in a disruption of the interaction with SALL4 even if DPF2 is not the very subunit mediating it (in other words the experiment could provide a false positive result). In our opinion, the only effective experiment would be mutating the DPF2 residues that we computationally predicted as responsible for the interaction with SALL4, but again this would be very laborious and out of the scope.

      That being said, we agree with the Reviewer that while the SALL4-BAF interaction was experimentally validated with robust approaches, the role of DPF2 in the interaction was only computationally predicted, which comes as a limitation of the study. We have now added a dedicated paragraph in the discussion to acknowledge such limitation.

      2- OPTIONAL: Does knockout of DPF2 phenocopy the Sall4 ko? This would be very interesting to include in the manuscript, but it would perhaps be a larger body of work.

      See point-1.

      3- Figure 1, the day of IP is not clearly described until later in the test. please outline during in the figure.

      We thank the Reviewer for pointing this out. This has been fixed.

      3- What is the expression of Sall1 (and other Sall paralogs) during differentiation. The same with the protein levels of Sall4, does this remain at the below 50%, or is this just during pluripotency?

      As Recommend by the Reviewer, we have performed time-course WB of SALL1 and SALL4. These experiments revealed that SALL1 remains very lowly expressed in wild-type conditions across time points and all the way through differentiation until CNCC (See updated supplementary Fig. S9). This is consistent with previous studies that demonstrated that SALL4, but not SALL1, is required for early mammalian development (see for example Miller et al. 2016, Development, and Koulle et al. 2025, Biorxiv). We performed the same time-course WB for SALL4 which revealed that SALL4 expression progressively decreases after day-5 (as expected) and itโ€™s very low at CNCC stage (day-14), therefore we would expect the KO to remain at even lower level at this stage.

      4- The authors hypothesise that Sall4 binds to enhancers- with the criteria for an enhancer being that these peaks > 1KB from the TSS are enhancers. Can this be reinforced by overlaying with other ChIP tracks that would give more confidence in this? There are several datasets from Joanna Wysocka's lab that also utilise this protocol which can give you more evidence to reinforce the claim and provide further detail as to the role of Sall4.

      We thank the Reviewer for this great suggestion. As recommended, we have used publicly available ChIP-seq data generated by the Wysocka lab (H3K4me1, H3K4m3) and also generated new H3K27ac CHIP-seq data as well. These experiments and analyses confirmed that these regions are putative CNCC enhancers (and a minority of them putative promoters), decorated with H3K4me1 and with progressive increase in H3K27ac after CNCC induction (day-5). See new Supplementary Figure S6.

      5- The authors state that cells fail to become cranial neural crest cells, however they do not propose what the cells do instead. do they become neural? Or they stay at pluriopotent, which is one option given the higher expression of Nanog, OCT4 and OTX2 that are all expressed in pluripotent stem cells.

      We think that it is likely a mix of both. There is a mixed bag of expression of pluripotency markers, but also high expression of neuroectodermal markers. This suggests that most cells safely reach the neuroectodermal stage but fail to go beyond that, while some of the cells simply do not differentiate or regress back to pluripotency. We would rather refrain on overinterpreting what the KO-cells become, as it is likely an aberrant cell type, but following the Reviewerโ€™s indication we have added a paragraph in the discussion to speculate on this.

      6- In general, I would like to see the gating strategy and controls for the flow cytometry in a supplemental figure.

      As Recommended by the Reviewer, we have added the gating strategy in the Supplementary Fig. S4.

      7- For supplementary figure 1- please include the gene names in the main image panels rather than just the germ layer.

      Done. The figure is now Supplementary Figure S3 since two supplementary figures were added before.


      Reviewer #2

      Summary In this manuscript, the authors build on their previous work (Pagliaroli et al., 2021) where they identified an interaction between the transcription factor SALL4 and the BAF chromatin remodeling complex at Day-5 of an iPSC to CNCC differentiation protocol. In their current work, the authors begin by exploring this interaction further, leveraging AlphaFold to predict interaction surfaces between SALL4 and BAF complex members, considering both SALL4 splice isoforms: a longer SALL4A (associated with developmental processes) and a shorter SALL4B (associated with pluripotency). They propose that SALL4A may interact with DPF2, a BAF complex member, in an isoform-dependent manner. The authors next explore the role of SALL4 in craniofacial development, motivated by patient heterozygous loss of function mutations, leveraging iPSC cells with an engineered SALL4 frameshift mutation (SALL4-het-KO). Using this model, the authors first demonstrate that a reduced expression of SALL4 does not impact the iPSC identity, perhaps due to compensation via upregulation of SALL1. Upon differentiation to neuroectoderm, SALL4 haploinsufficiency causes a reduction in newly accessible sites which are associated with a reduction in SALL4 binding and therefore a loss of BAF complex recruitment. Interestingly, however, there were few transcriptional changes at this stage. Later in the CNCC differentiation at Day-14 when the wildtype cells have switched expression of CNCC markers, the SALL4-het-KO cells fail to switch cadherin expression associated with a transition from epithelial to mesenchymal state, and fail to induce CNCC specification and post-migratory markers. Together the authors propose that SALL4 recruits BAF to CNCC enhancers as early as the neuroectodermal stage, and failure of BAF recruitment in SALL4-het-KO lines results in a loss of open chromatin at regulatory regions required later for induction of the CNCC programme. The failure of the later differentiation is compelling in the light of the early stages of the differentiation progressing normally, and the authors outline an interesting proposed mechanism whereby SALL4 recruits BAF to remodel chromatin ahead of CNCC enhancer activation, a model that can be tested further in future work. The link between SALL4 DNA binding and BAF recruitment is nicely argued, and very interesting as altered chromatin accessibility at Day 5 in the neuroectodermal stage is associated with only few changes in gene expression, while gene expression is greatly impacted later in the CNCC stage at Day 14. The in silico predictions of SALL4-BAF interaction interfaces are perhaps less convincing, requiring experimental follow-up outside the scope of this paper. Some of the associated figures could perhaps be moved to the supplement to enhance the focus on the later functional genomics experiments.

      We thank the Reviewer for the nice words of appreciation of our manuscript.

      Major comments

      1. A lot of emphasis is placed on the AlphaFold predictions in Figure 1, however the predictions in Figure 1B appear to be mostly low or very low confidence scores (coloured yellow and orange). It is unclear how much weight can be placed on these predictions without functional follow-up, e.g. mutating certain residues and showing impact on the interaction by co-IP. The latter parts of the manuscript are much better supported experimentally, and therefore perhaps some of the Figure 1 could move to a Supplemental Figure (e.g. the right-hand part of 1B, and the lower part of Figure 1C showing SALL4B predicted interactions). The limitations of AlphaFold predictions should be acknowledged and the authors should discuss how these predicted interactions could be experimentally explored further in the future.

      As recommended by the Reviewer, we have moved part of the AlphaFold predictions to Supplementary Figure S1, and we added a paragraph in the discussion to acknowledge the limitations of AlphaFold.

      The authors only show data for one heterozygous knockout clone for SALL4. It is usual to have more than one clone to mitigate potential clonal effects. The authors should comment why they only have one clone and include any data for a second clone for key experiments if they already have this. Alternatively, the authors could provide any quality control information generated during production of this line, for example if any additional genotyping was performed.

      We apologize for the confusion and for our lack of clarify on this. We have used two clones (one generated with a 11 bp deletion, one with a 19 bp deletion, both in exon-1, see also the point 6 of your minor points). The two clones were used as biological replicates, so for example the two ATAC-seq replicates performed in each time point were performed with the two different clones, and the three RNA-seq replicates were performed with two technical replicates of the clone with the 11bp deletion and one replicate with the clone with 19 bp deletion. We have clarified this in the methods section of the manuscript and added a Supplementary Figure (S2) showing the editing strategy for the two clones. Thank you for catching it.

      The authors show all genomics data (ATAC-seq, CUT&RUN and ChIP-seq) as heatmaps and average profiles. It would be valuable to see some representative loci for the ATAC seq (perhaps along with SALL4 and BRG1 recruitment) at some representative and interesting loci.

      As recommended by the Reviewer, we have added Genome Browser screenshots of representative loci in Fig. 6.

      Figure 4A. The schematic could be improved by including brightfield or immunofluorescent images at the three stages of the differentiation. Are the iPS cells seeded as single cells, or passaged as colonies before starting the differentiation. Further details are required in the methods to clarify how the differentiation is performed, for example at what Day are the differentiating cells passaged, this is not shown on the schematic in Figure 4A.

      As recommended, we added IF images in the Fig. 4A schematic, and added more details in the methods.

      There is likely some heterogeneity of cell types in the differentiation at Day 5 and Day 14. Can the authors comment on this from previous publications or perhaps conduct some IF for markers to demonstrate what proportions of cells are neuroectoderm at Day 5 and CNCCs at Day 14.

      The differentiation starts with single cells that aggregate to form neuroectodermal clusters, as per original protocol. The CNCCs that we obtain with this protocol homogeneously express CNCC markers, as shown by IF of SOX9 in Fig. 4A. For the day-5, as recommended we have added IF for PAX6 also showing homogeneous expression (Fig. 4A).

      For the motif analysis for Day 5-specific SALL4 binding sites (Figure 4E), was de novo motif calling performed? Were any binding sites reminiscent of a SALL4 binding site observed (e.g. an AT-rich motif)? Could the authors comment on this in the text - if there is no SALL4 binding motif, does this suggest SALL4 is recruited indirectly to these sites via interaction with another transcription factor for example?

      Similar to SALL4, SALL1 also recognizes AT-rich motifs. However, while we found AT-rich motifs as enriched in our day-5 motif analysis (in the regions that gain SALL4 binding upon differentiation), the enrichment is not particularly strong, and several other motifs are significantly more enriched, suggesting that, like the Reviewer mentioned, SALL4 might be recruited indirectly at these sites by other factors. We have added a paragraph on this in the discussion.

      Does SALL1 remain upregulated at Day-5 and Day-14 of the differentiation for the SALL4-het-KO line? Are binding sites known for this TF and were they detected in the motif analysis performed? Further discussion of the impact of the overexpression of SALL1 on the phenotypes observed is warranted - e.g. for Figure 5F, could the sites associated with a gain of BRG1 peaks upon loss of SALL4 be associated with SALL1 being upregulated and 'hijacking' BAF recruitment to distinct sites associated with nervous system development? Is SALL1 still upregulated at Day 5?

      As mentioned above, SALL1 also recognizes AT-rich motifs but similar to SALL4 also binds unspecifically, likely in cooperation with other TFs. Like the Reviewer suggested, it is certainly possible that some of the sites associated with a gain of BRG1 peaks upon loss of SALL4 could be associated with SALL1 being upregulated and 'hijacking' BAF recruitment to distinct sites. While this is speculative, we have added a paragraph on this in the discussion.

      Related to the point above, SALL4A is proposed to have an isoform-specific interaction with the BAF complex. It would be valuable to plot SALL4A and SALL4B expression from the available RNA-seq data at Day 0, 5 and 14 to explore whether stage-specific isoform expression matches with the proposed role of SALL4A to interact with BAF at Day 5. It could be valuable to also look at expression of SALL1, 2 and 3 across the time course to see whether additional compensation mechanisms are at play during the differentiation.

      Thanks for suggesting this. We performed a time course analysis of isoform specific gene expression, which showed that SALL4B expression remains low throughout differentiation, while SALLA4A expression increases upon differentiation cues and it remains at high levels until the end. We have added this to supplementary Fig. S9. Moreover, we have performed an additional experiment, using pomalidomide, which is a thalidomide derivative that selectively degrades SALL4A but not SALL4B. Notably, SALL4A degradation recapitulated the main findings obtained with the CRISPR-KO of SALL4, further supporting that SALL4A is the isoform involved in CNCC induction (see new Fig. 8).

      At line 264, The authors state "SALL4 recruits the BAF complex at CNCC developmental enhancers to increase chromatin accessibility". Given that this analysis is performed at Day 5 of the differentiation, which is labelled as neuroectoderm what evidence do the authors have that these are specifically CNCC enhancers? Statements relating to enhancers should generally be re-phrased to putative enhancers (as no functional evidence is provided for enhancer activity), and further evidence could be provided to support that these are CNCC-specific regulatory elements, e.g. showing representative gene loci from CNCC-specific genes. Discussion of the RNA-seq presented in Supplementary Figure 2B may also be appropriate to introduce here given that large numbers of accessible chromatin sites are detected while the expression of very few genes is impacted, suggesting these sites may become active enhancers at a later developmental stage.

      As also recommended by the other Reviewer, to further characterize these sites, we have used publicly available histone modification CHIP-seq data (H3K4me1, H3K4me3) generated by the Wysocka lab (H3K4me1, H3K4m3) and also generated new H3K27ac CHIP-seq data as well. These experiments and analyses confirmed that these regions are putative CNCC enhancers (and a minority of them putative promoters), all decorated with H3K4me1, and all showing progressive increase in H3K27ac after CNCC induction (day-5). See new Supplementary Figure S6.

      1. Do any of the putative CNCC enhancers detected at Day 5 as being sensitive to SALL4 downregulation and loss of BAF recruitment overlap with previously tested VISTA enhancers (https://enhancer.lbl.gov/vista/)?

      Yes, we have found examples of overlap and have included two of them in the updated Figure 6 as Genome Browser screenshots.

      Minor comments

      1. The authors are missing references in the introduction "a subpopulation of neural crest cells that migrate dorsolaterally to give rise to the cartilage and bones of the face and anterior skull, as well as cranial neurons and glia".

      Fixed, thank you.

      The discussion of congenital malformations associated with SALL4 haploinsufficiency is brief in the introduction. From OMIM, SALL4 heterozygous mutations are implicated with the condition Duane-radial ray syndrome (DRRS) with "upper limb anomalies, ocular anomalies, and, in some cases, renal anomalies... The ocular anomalies usually include Duane anomaly". That Duane anomaly is one phenotype among a number for patients with SALL4 haploinsufficiency could be clarified in the introduction. Of note, this is stated more clearly in the discussion but needs re-wording in the introduction.

      Done, thank you.

      The statements "show that the SALL4A isoform directly interacts with the BAF complex subunit DPF2 through its zinc-finger-3 domain" and "this interaction occurs between the zinc-finger-cluster-3 (ZFC3) domain of SALL4A and the plant homeodomains (PHDs) of DPF2" in the introduction appear overstated and should be toned down. To show this the authors would need to mutate or delete the proposed important zinc-finger domains from SALL4A, which is outside the scope of this work. Notably, this is less strongly-stated elsewhere in the manuscript, e.g "predict that this interaction is mediated by the BAF subunit DPF2", Line 162.

      Done, thank you.

      Could the authors clarify why 3 Alphafold output models are shown for SALL4B in Figure 1C, and only one output model for SALL4A?

      AlphaFold3 produces five separate predicted models per protein combination (e.g., Model_1 โ€ฆ Model_4), each derived from slightly different network parameters or initializations. The final output prioritizes the model with the highest confidence score. This multi-model strategy enables the identification of the most robust conformation while providing a measure of structural uncertainty (as per GitHub documentation for AlphaFold3). wE have conducted the same analysis for SALL4A as we did for SALL4B. Specifically, SALL4A interacts with the AT-rich DNA in models 0, 1, and 2, therefore models 3 and 4 were excluded. When analysing models 1 and 2, we found a higher number of residues involved in the interaction (>800 instead of 396). Similarly to model 0, only the interactions between residues belonging to an annotated functional domain (ZFs and PHDs) were considered.

      In Model 1: SALL4A and DPF2 interact mainly through ZF6 and 7, and not 5 as Model 0.

      In Model 2: SALL4A and DPF2 interact mainly through ZF5 and 6, and not 7 as Models 0. In contrast, this model shows an interaction with ZF1 not shown in the other two models, but with a higher PAE (31 average compared to 25 to 27 average of the other two ZFs.

      Therefore, we considered Model 0 as it is the model with higher confidence and representative of all significant models (includes ZF5, 6, and 7).

      Line 121. The authors state "DPF2, a broadly expressed BAF subunit,", but don't show expression during their CNCC differentiation. It would be good to include expression of DPF2 in Figure 1E.

      Done, thank you.

      The text states "a 11 bp deletion within the 3'-terminus of exon 1 of SALL4", while the figure legend states, "Sanger sequencing confirming the 19 bp deletion in one allele of SALL4 is displayed". The authors should clarify this disparity and experimentally confirm the deletion, e.g. by TA-cloning the two alleles and sequencing these separately to show that one allele is wildtype and the other has a frameshift deletion.

      We apologize for the confusion. As stated above (point-2 of the major comments), we have used two clones (one generated with a 11 bp deletion, one with a 19 bp deletion, both in exon-1, see also the point 6 of your minor points). The two clones were used as biological replicates (see response above for details). The deletion for both clones was experimentally confirmed by Sanger sequencing by the company that generated the lines for us (Synthego). The strategy for the two clones is now shown also in Supplementary Fig. S2.

      The authors generate an 11-bp (or 19-bp?) deletion in exon-1 - it would be valuable to include a discussion whether patients have been identified with deletions and frame-shift mutations in this region of SALL4 exon-1. And also clarify, if not clearly stated in the text, that both SALL4A and SALL4B will be impacted by this mutation. Are there examples of patient mutations which only impact SALL4A?

      As requested, we have added a discussion paragraph to discuss this. And, yes, both SALL4A and SALL4B are impacted by both deletions in both clones (11 bp and 19 bp deletion).

      Regarding patient variants on exon-1 and patient variants that only impact SALL4A. We could only find one published pathogenic 170bp deletion in exon 1 (VCV000642045.7). The majority of the pathogenic or likely pathogenic variances are located on exon2. In particular, of the 63 reported pathogenic (or likely pathogenic) clinical variants, 42 were located on exon 2. Among these, 28 are located in the portion shared by both SALL4A and SALL4B, while the remaining 14 were SALL4A specific.

      For the SALL4 blots in Figure 2B, is the antibody expected to detect both isoforms (SALL4A and SALL4B), and which isoform is shown? If two isoforms are detected, they should both be presented in the figure.

      Yes, the antibody detects both isoforms, and we now present both in the figure 2, as recommended.

      SALL4 expression should be shown for Figure 2C to see whether the >50% down-regulation of SALL4 at the protein level may be partially driven by transcriptional changes.

      Done, thank you. As expected, we observed the SALL4 mRNA expression in the KO line is comparable to wild-type conditions, but still this results in a significant decrease of the SALL4 protein level likely because of autoregulatory mechanisms coupled with non-sense mediated decay of the mutated allele. Also, we note that SALL4 usually makes homodimers, therefore lack of sufficient amount of protein could also lead to degradation of the monomers.

      The number of experimental replicates should be indicated in all figure legends where relevant. Raw data points should be plotted visibly over the violin plots (e.g. Figure 2C).

      Done, thank you.

      For Figure 3A, the images of the DAPI and NANOG/OCT4 staining should be shown separately in addition to the overlay.

      Done, thank you.

      The metric 'Corrected Total Cell Fluorescence (CTCF)' should be described in the methods. The number of images used for the quantification in Figure 3A should be

      Done, thank you.

      Figure 3C - what are the 114 differentially expressed genes? Some interesting genes could be labelled on the plot and the data used to generate this plot should be included as a Supplementary Table. Supplementary Tables should similarly be provided for Figure 6C, Day 14 and Supplementary Figure 2B, Day 5.

      As recommended, we have highlighted some interesting genes in the volcano plot and also included all the expression data for all genes in Supplementary Table S3.

      Figure 4B. The shared peaks are not shown. For completeness, it would be ideal to show these sites also.

      Done, thank you.

      Figure 4C is difficult to interpret. Why is the plot asymmetric to the left versus right? What does the axis represent - % of binding sites?

      The asymmetry is due to the fact that there is a larger number of peaks that are downstream of the TSS than peaks that are upstream of TSS. This is consistent with the fact that many SALL4 peaks are in introns, likely representing intronic enhancers.

      Line 224-225. What do n= 3,729 and n= 6,860 refer to? There appear to be many more binding sites indicated in Figure 4B, therefore these numbers cannot represent 86% and 97% of sites?

      Thank you for pointing this out, we should have specified in the text. Those numbers refer to the genes whose TSS is closest to each SALL4 peak. Notably, multiple peaks can share the same closest TSS, hence the discrepancy between # of peaks and # of nearest genes.

      Raw numbers:

      • Day-0 RAW = 6,104 (peaks = 6,114);
      • Day-5 RAW = 17,131 (peaks = 17,137). Now raw data reported in Supplementary Table 4.

      Figure 4E. Several TFs mentioned in the text (Line 243) are not shown in the figure, it would be good to show all TFs motifs mentioned in the text in this figure. Again, there is no mention of whether a sequence-specific motif is detected for SALL4 (e.g. an AT-rich sequence) from this motif analysis.

      Done, thank you. An AT-rich sequence, resembling the SALL4 motif, was detected in a small minority of sites (this is now shown in Supplementary Figure S5), suggesting that SALL4 engages chromatin in a broad manner, going beyond its preferred motif, possibly in cooperation with other TFs. This is consistent with many studies that in mESCs have shown that SALL4 binds at OCT4/NANOG/SOX2 target motifs. This is now discussed in a dedicated paragraph in the discussion.

      Figure 4G. How was the ATAC-seq data normalized for the WT and SALL4-het-KO lines for this comparison? The background levels of accessibility seem quite different in Replicate 1.

      The bigwigs used to make the heatmaps are normalized by sequencing depth using the Deeptools Suite (normalization by RPKM).

      Figures 5B-C could be exchanged to flow better with the text. A Venn diagram could be included to show the overlap between the sites losing BRG1 in SALL4-het-KO (13,505 sites) and the Day5-specific SALL4 sites (17,137 sites).

      Done, thank you.

      At Day 5, the authors suggest a shift towards neural differentiation. It could be interesting for the authors to perform qRT-PCR at Day 5 for some neural markers or look in the Day 14 data for markers of neural differentiation at the expense of CNCC markers.

      See updated Supplementary Fig. S8, where we show timecourse expression of several genes, including neural markers.

      Is the data used to plot Figure 5D the same as Figure 4G. If so, why is only one replicate shown in Figure 5D?

      Only one replicate was shown in the main figure purely for lack of space, but the experiment was replicated twice (with the two different clones), and the results were exactly the same. See plots below for your convenience:

      Figure 6A. How many replicates are shown? If n=2, boxplots are not an appropriate to represent the distribution of the data. Please include n= X in the figure legend and plot the raw data points also.

      Done, thank you, and as suggested we are no longer using boxplots for this panel.

      Figure 6B. What is the significance of CD99 for CNCC differentiation?

      Figure 6F. No error bars are shown, how many replicates were performed for this time couse? The linear regression line does not appear to add much value and could be removed.

      As suggested, we have removed these plots and replaced them with individual genes plots, which include error bars. See updated Supplementary Figure S8.

      At line 304, the authors state "while SALL4-het-KO showed a significant downregulation of these genes". Perhaps 'failed to induce these genes' may be more accurate unless they were expressed at Day 5 and downregulated at Day 14.

      Done, thank you.

      Lines 332-335. The genes selected for pluripotency, neural plate border, CNCC specification could be plotted separately in the Supplement to show individual gene expression dynamics.

      Done, thank you, see point 24.

    1. Sitting alone in my room and looking through that box of books, it was crazy to think aboutjust how much reading had positively impacted my life. I'm curious to know if other people havehad the same kind of experiences as me. What kind of impact does not just reading but alsodeveloping many different kinds of multiliteracies actually have on people long-term? It would beinteresting to study whether there is a correlation between developing various multiliteracies earlyin childhood and success later in life, just as I believe there has been in my life. Would my gradeshave been the same without all of my sponsors? Would I still have been accepted to UCF withoutthe many literacies I have acquired? Would I still have been that same kid, sitting in my room aloneand scared as all hell of leaving home?What life and literacy have shown me so far is that you canโ€™t abandon hope. Iโ€™ve learned thatthe world can be a very confusing place, especially if youโ€™re not versed in all of its literacies. Iโ€™vealso learned to keep that in mind, and when life throws me in a new direction, I try to embrace that.Life and literacy have taught me that when your walls are painted blank, you should let themrepresent a new page in your life. When it's three a.m., and you've been stuck on the same sentencefor the past three hours, and your paper is due in the morning, you canโ€™t abandon hope. And whenyour adversaries drive you into a corner, when you feel like everyone around you is speaking aforeign language, when everything is going wrong, and especially when youโ€™re going to a new placewith sure to be alien literacies, Iโ€™ve learned the best thing you can do is to take it all in, remember topick up your towel, and never, never ever forget that mottoโ€”don't panic.

      This conclusion reflects on how literacy shaped they writer's life and future. They wonder if their success in school and acceptance to UCf would have been possible without their literacy sponsors and multiliteracies. The final message is hopeful, no matter how confusing life gets, literacy has taught them to adapt, keep going, and "don't panic."

    2. Ah, debate. Like modern day linguistic gladiator fights. This is where literacies come tobattle it out and, in some cases, even die. I joined the debate team really early in my high schoolcareer, and if I had not held a wide range of multiliteracies by then, I would have developed them atthat time. Obviously, I needed very clear communication skills just to be able to compete. Theability to write ten minute speeches, or, for that matter, even four minute speeches, is notsomething everyone possesses. But the intricacies, the โ€œkill words,โ€ the strategies that wouldupstage Sun Tzuโ€”it is in those skills where the real literacy of debate lies. While there is noinstruction manual on winning a debate, doing so requires a very clear understanding of what yourjudges wants to hear, what your adversary is actually communicating (and not just what he wantsto communicate), and much, much more.โ€œAlwaysโ€ (just to name one from the dozens) was a kill word. Since it's not often somethingis โ€œalwaysโ€ true, using that word by accident or on purpose usually meant that an adversary couldโ€œkillโ€ you on that claim. But thatโ€™s just where it starts. Sometimes people would bait others with killwords just to pull attention from other weaker claims they wereusing. Or better yet, cite untrustworthy sources just so theiropponents could waste the rest of the remaining time citing thedozens they had to back them up. Mirabelli speaks about thestruggle for control in the interactions between waiter andcustomer. As can be seen, this struggle for control is manifested inthe debate world in a much more tangible way. Hand signals,changes in pitch, even moments of silence are all used to gaincontrol of the debate, just as a soccer player fights for control of aball.The best debaters were also literate in the signals someonemade when they were bluffing on a claim, or better yet when theywere about to break down. Losing your cool in a debate, screaming,or using language that was a little too passionate usually resultedin that person losing. One important strategy in any debate isspotting a weak point and then striking that weak point until youropponent is frantic, all the while making sure it still appears you are amicable to the judge. Beingliterate in this kind of knowledge actually prepared me for watching the presidential debates. Iknew exactly what Biden was doing when he was laughing at Paul Ryan's arguments. When Obamastayed quiet while Romney was arguing with him, I knew he was just baiting him to look foolish. Ona much larger scale, the literacy of the private struggle for power in communication has alsoallowed me to spot those kinds of situations in my own life.A lot of what I learned from debate has also gone into my writing. When I was writingclaims for debate, I had to have all these strategic elements in mind. Not supporting any one claimwas a failure of biblical proportions, a failure that would undoubtedly crucify me in front of thejudges. It was that serious. Now, whenever I write, there is always a little voice inside my headasking for evidence, checking for loopholes in my arguments, and really just being a generalnuisance.Now, whenever Iwrite, there is alwaysa little voice insidemy head asking forevidence, checkingfor loopholes in myarguments, andreally just being ageneral nuisance.

      This section shows how being on the debate team helped a writer build new literacies. They learned not just to write and speak clearly, but also to use strategies, read signals, and control arguments, similar to Mirabelli's idea of power struggles in communication. Debate even shaped how the writer approaches writing today, always thinking about evidence and possible weaknesses in their arguments.

    1. Our data would be used by the very people that beat that language out of ourmouths to sell it back to us as a service,โ€ Jones says. โ€œItโ€™s just like taking our land and selling it back tous,โ€ Mahelona adds.

      US tech companies have little interest in letting Maori people reconnect with their native customs , but instead, offer this "privilege" at a steep cost

    1. Not only has the workload increased exponentially, so has micromanagement, says Nicholas Cream, a social studies teacher in Holyoke, Mass.ย  โ€œItโ€™s always just one more thing added to our plate. And w

      All of those things in the graph are so valid. There is so much that goes into teacher that people don't understand so it makes sense that it causes so much stress.

    1. Always imagine who your hypothetical audience is (what type of publication would the content of your essay fit into?) and that will help you determine the specifics of your writing style.

      Tone is so important to the purpose of your writing and knowing when it's appropriate for casual language or not is just as important.

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript proposes that the use of a latent cause model for assessment of memory-based tasks may provide improved early detection in Alzheimer's Disease as well as more differentiated mapping of behavior to underlying causes. To test the validity of this model, the authors use a previously described knock-in mouse model of AD and subject the mice to several behaviors to determine whether the latent cause model may provide informative predictions regarding changes in the observed behaviors. They include a well-established fear learning paradigm in which distinct memories are believed to compete for control of behavior. More specifically, it's been observed that animals undergoing fear learning and subsequent fear extinction develop two separate memories for the acquisition phase and the extinction phase, such that the extinction does not simply 'erase' the previously acquired memory. Many models of learning require the addition of a separate context or state to be added during the extinction phase and are typically modeled by assuming the existence of a new state at the time of extinction. The Niv research group, Gershman et al. 2017, have shown that the use of a latent cause model applied to this behavior can elegantly predict the formation of latent states based on a Bayesian approach, and that these latent states can facilitate the persistence of the acquisition and extinction memory independently. The authors of this manuscript leverage this approach to test whether deficits in production of the internal states, or the inference and learning of those states, may be disrupted in knock-in mice that show both a build-up of amyloid-beta plaques and a deterioration in memory as the mice age.

      Strengths:

      I think the authors' proposal to leverage the latent cause model and test whether it can lead to improved assessments in an animal model of AD is a promising approach for bridging the gap between clinical and basic research. The authors use a promising mouse model and apply this to a paradigm in which the behavior and neurobiology are relatively well understood - an ideal situation for assessing how a disease state may impact both the neurobiology and behavior. The latent cause model has the potential to better connect observed behavior to underlying causes and may pave a road for improved mapping of changes in behavior to neurobiological mechanisms in diseases such as AD.<br /> The authors also compare the latent cause model to the Rescorla-Wagner model and a latent state model allowing for better assessment of the latent cause model as a strong model for assessing reinstatement.

      Weaknesses:

      I have several substantial concerns which I've detailed below. These include important details on how the behavior was analyzed, how the model was used to assess the behavior, and the interpretations that have been made based on the model.<br /> (1) There is substantial data to suggest that during fear learning in mice separate memories develop for the acquisition and extinction phases, with the acquisition memory becoming more strongly retrieved during spontaneous recovery and reinstatement. The Gershman paper, cited by the authors, shows how the latent causal model can predict this shift in latent causes by allowing for the priors to decay over time, thereby increasing the posterior of the acquisition memory at the time of spontaneous recovery. In this manuscript, the authors suggest a similar mechanism of action for reinstatement, yet the model does not appear to return to the acquisition memory after reinstatement, at least based on the simulation and examples shown in figures 1 and 3. More specifically, in figure 1, the authors indicate that the posterior probability of the latent cause, z<sub>A</sub> (the putative acquisition memory), increases, partially leading to reinstatement. This does not appear to be the case as test 3 (day 36) appears to have similar posterior probabilities for z<sub>A</sub> as well as similar weights for the CS as compared to the last days of extinction. Rather, the model appears to mainly modify the weights in the most recent latent cause, z<sub>B</sub> - the putative the 'extinction state', during reinstatement. The authors suggest that previous experimental data have indicated that spontaneous recovery or reinstatement effects are due to an interaction of the acquisition and extinction memory. These studies have shown that conditioned responding at a later time point after extinction is likely due to a balance between the acquisition memory and the extinction memory, and that this balance can shift towards the acquisition memory naturally during spontaneous recovery, or through artificial activation of the acquisition memory or inhibition of the extinction memory (see Lacagnina et al. for example). Here the authors show that the same latent cause learned during extinction, z<sub>B</sub>, appears to dominate during the learning phase of reinstatement, with rapid learning to the context - the weight for the context goes up substantially on day 35 - in z<sub>B</sub>. This latent cause, z<sub>B</sub>, dominates at the reinstatement test, and due to the increased associative strength between the context and shock, there is a strong CR. For the simulation shown in figure 1, it's not clear why a latent cause model is necessary for this behavior. This leads to the next point.

      (2) The authors compared the latent cause model to the Rescorla-Wagner model. This is very commendable, particularly since the latent cause model builds upon the RW model, so it can serve as an ideal test for whether a more simplified model can adequately predict the behavior. The authors show that the RW model cannot successfully predict the increased CR during reinstatement (Appendix figure 1). Yet there are some issues with the way the authors have implemented this comparison:<br /> (2A) The RW model is a simplified version of the latent cause model and so should be treated as a nested model when testing, or at a minimum, the number of parameters should be taken into account when comparing the models using a method such as the Bayesian Information Criterion, BIC.<br /> (2B) The RW model provides the associative strength between stimuli and does not necessarily require a linear relationship between V and the CR. This is the case in the original RW model as well as in the LCM. To allow for better comparison between the models, the authors should be modeling the CR in the same manner (using the same probit function) in both models. In fact, there are many instances in which a sigmoid has been applied to RW associative strengths to predict CRs. I would recommend modeling CRs in the RW as if there is just one latent cause. Or perhaps run the analysis for the LCM with just one latent cause - this would effectively reduce the LCM to RW and keep any other assumptions identical across the models.<br /> (2C) In the paper, the model fits for the alphas in the RW model are the same across the groups. Were the alphas for the two models kept as free variables? This is an important question as it gets back to the first point raised. Because the modeling of the reinstatement behavior with the LCM appears to be mainly driven by latent cause z<sub>B</sub>, the extinction memory, it may be possible to replicate the pattern of results without requiring a latent cause model. For example, the 12-month-old App NL-G-F mice behavior may have a deficit in learning about the context. Within the RW model, if the alpha for context is set to zero for those mice, but kept higher for the other groups, say alpha_context = 0.8, the authors could potentially observe the same pattern of discrimination indices in figure 2G and 2H at test. Because the authors don't explicitly state which parameters might be driving the change in the DI, the authors should show in some way that their results cannot simply be due to poor contextual learning in the 12 month old App NL-G-F mice, as this can presumably be predicted by the RW model. The authors' model fits using RW don't show this, but this is because they don't consider this possibility that the alpha for context might be disrupted in the 12-month-old App NL-G-F mice. Of course, using the RW model with these alphas won't lead to as nice of fits of the behavior across acquisition, extinction, and reinstatement as the authors' LCM, the number of parameters are substantially reduced in the RW model. Yet the important pattern of the DI would be replicated with the RW model (if I'm not mistaken), which is the important test for assessment of reinstatement.

      (3) As stated by the authors in the introduction, the advantage of the fear learning approach is that the memory is modified across the acquisition-extinction-reinstatement phases. Although perhaps not explicitly stated by the authors, the post-reinstatement test (test 3) is the crucial test for whether there is reactivation of a previously stored memory, with the general argument being that the reinvigorated response to the CS can't simply be explained by relearning the CS-US pairing, because re-exposure the US alone leads to increase response to the CS at test. Of course there are several explanations for why this may occur, particularly when also considering the context as a stimulus. This is what I understood to be the justification for the use of a model, such as the latent cause model, that may better capture and compare these possibilities within a single framework. As such, it is critical to look at the level of responding to both the context alone and to the CS. It appears that the authors only look at the percent freezing during the CS, and it is not clear whether this is due to the contextual-US learning during the US re-exposure or to increased responding to the CS - presumably caused by reactivation of the acquisition memory. The authors do perform a comparison between the preCS and CS period, but it is not clear whether this is taken into account in the LCM. For example, the instance of the model shown in figure 1 indicates that the 'extinction cause', or cause z6, develops a strong weight for the context during the reinstatement phase of presenting the shock alone. This state then leads to increased freezing during the final CS probe test as shown in the figure. If they haven't already, I think the authors must somehow incorporate these different phases (CS vs ITI) into their model, particularly since this type of memory retrieval that depends on assessing latent states is specifically why the authors justified using the latent causal model. In more precise terms, it's not clear whether the authors incorporate a preCS/ITI period each day the cue is presented as a vector of just the context in addition to the CS period in which the vector contains both the context and the CS. Based on the description, it seemed to me that they only model the CRs during the CS period on days when the CS is presented, and thereby the context is only ever modeled on its own (as just the context by itself in the vector) on extinction days when the CS is not presented. If they are modeling both timepoints each day that the CS I presented, then I would recommend explicitly stating this in the methods section.

      (4) The authors fit the model using all data points across acquisition and learning. As one of the other reviewers has highlighted, it appears that there is a high chance for overfitting the data with the LCM. Of course, this would result in much better fits than models with substantially fewer free parameters, such as the RW model. As mentioned above, the authors should use a method that takes into account the number of parameters, such as the BIC.

      (5) The authors have stated that they do not think the Barnes maze task can be modeled with the LCM. Whether or not this is the case, if the authors do not model this data with the LCM, the Barnes maze data doesn't appear valuable to the main hypothesis. The authors suggest that more sophisticated models such as the LCM may be beneficial for early detection of diseases such as Alzheimer's, so the Barnes maze data is not valuable for providing evidence of this hypothesis. Rather, the authors make an argument that the memory deficits in the Barnes maze mimic the reinstatement effects providing support that memory is disrupted similarly in these mice. Although, the authors state that the deficits in memory retrieval are similar across the two tasks, the authors are not explicit as to the precise deficits in memory retrieval in the reinstatement task - it's a combination of overgeneralizing latent causes during acquisition, poor learning rate, over differentiation of the stimuli.

    1. We have accepted the idea that a television program is now a tangible object that can be purchased, collected, and cataloged, just like books, musical albums, and films. This is a comparably new concept, and Iโ€™d argue a transformative one for television scholars, producers, and viewers.

      The idea that a TV programs can be treated like a book or album, hold significant value to viewers was a huge development. Itโ€™s not is just watching anymore but also collecting, comparing, and studying older shows, which now changes the way people experience TV but also influence how its made.

    1. ccording to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible. Yellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little. Barry! Breakfast is ready! Coming! Hang on a second. Hello? - Barry? - Adam? - Can you believe this is happening? - I can't. I'll pick you up. Looking sharp. Use the stairs. Your father paid good money for those. Sorry. I'm excited. Here's the graduate. We're very proud of you, son. A perfect report card, all B's. Very proud. Ma! I got a thing going here. - You got lint on your fuzz. - Ow! That's me! - Wave to us! We'll be in row 118,000. - Bye! Barry, I told you, stop flying in the house! - Hey, Adam. - Hey, Barry. - Is that fuzz gel? - A little. Special day, graduation. Never thought I'd make it. Three days grade school, three days high school. Those were awkward. Three days college. I'm glad I took a day and hitchhiked around the hive. You did come back different. - Hi, Barry. - Artie, growing a mustache? Looks good. - Hear about Frankie? - Yeah. - You going to the funeral? - No, I'm not going. Everybody knows, sting someone, you die. Don't waste it on a squirrel. Such a hothead. I guess he could have just gotten out of the way. I love this incorporating an amusement park into our day. That's why we don't need vacations. Boy, quite a bit of pomp... under the circumstances. - Well, Adam, today we are men. - We are! - Bee-men. - Amen! Hallelujah! Students, faculty, distinguished bees, please welcome Dean Buzzwell. Welcome, New Hive City graduating class of... ...9:15. That concludes our ceremonies. And begins your career at Honex Industries! Will we pick our job today? I heard it's just orientation. Heads up! Here we go. Keep your hands and antennas inside the tram at all times. - Wonder what it'll be like? - A little scary. Welcome to Honex, a division of Honesco and a part of the Hexagon Group. This is it! Wow. Wow. We know that you, as a bee, have worked your whole life to get to the point where you can work for your whole life. Honey begins when our valiant Pollen Jocks bring the nectar to the hive. Our top-secret formula is automatically color-corrected, scent-adjusted and bubble-contoured into this soothing sweet syrup with its distinctive golden glow you know as... Honey! - That girl was hot. - She's my cousin! - She is? - Yes, we're all cousins. - Right. You're right. - At Honex, we constantly strive to improve every aspect of bee existence. These bees are stress-testing a new helmet technology. - What do you think he makes? - Not enough. Here we have our latest advancement, the Krelman. - What does that do? - Catches that little strand of honey that hangs after you pour it. Saves us millions. Can anyone work on the Krelman? Of course. Most bee jobs are small ones. But bees know that every small job, if it's done well, means a lot. But choose carefully because you'll stay in the job you pick for the rest of your life. The same job the rest of your life? I didn't know that. What's the difference? You'll be happy to know that bees, as a species, haven't had one day off in 27 million years. So you'll just work us to death? We'll sure try. Wow! That blew my mind! "What's the difference?" How can you say that? One job forever? That's an insane choice to have to make. I'm relieved. Now we only have to make one decision in life. But, Adam, how could they never have told us that? Why would you question anything? We're bees. We're the most perfectly functioning society on Earth. You ever think maybe things work a little too well here? Like what? Give me one example. I don't know. But you know what I'm talking about. Please clear the gate. Royal Nectar Force on approach. Wait a second. Check it out. - Hey, those are Pollen Jocks! - Wow. I've never seen them this close. They know what it's like outside the hive. Yeah, but some don't come back. - Hey, Jocks! - Hi, Jocks! You guys did great! You're monsters! You're sky freaks! I love it! I love it! - I wonder where they were. - I don't know. Their day's not planned. Outside the hive, flying who knows where, doing who knows what. You can't just decide to be a Pollen Jock. You have to be bred for that. Right. Look. That's more pollen than you and I will see in a lifetime. It's just a status symbol. Bees make too much of it. Perhaps. Unless you're wearing it and the ladies see you wearing it.

    1. The usual metaphor is that โ€œmental energyโ€ is like a battery that is drained through the day, in greater and lesser quantities, and is replenished by sleep. To me, energy is less like a battery and more like voltage. Some machines require a threshold voltage to operate. Below that voltage they donโ€™t just operate slower, they donโ€™t operate at all. Analogously, different categories of activity have different threshold voltages. For me, itโ€™s like this: Things I am averse to, the things I intuitively want to put off because they bring up painful emotions, are high-voltage. Creative, open-ended work is high-voltage to start, but once you get started, keeping it going is medium-voltage. Simple chores like cleaning, throwing clothes in the washing machine, etc. are low-voltage. And when I wake up I have the highest possible voltage, and throughout the course of the day the voltage declines. And thatโ€™s the key difference from spoon theory: spoons are fungible across time, voltage is not. For each category of activity, there is a span of the day when I can action it.

      How do I model my second wind after the office empties out...

    1. Author response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, Lรณpez-Jimรฉnez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells).ย 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewerโ€™s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and this reference has been more fully discussed in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and excitement for our results, including our findings on T3SS activity and Shigella-septin interactions. In accordance with the Reviewerโ€™s comments, we avoid overselling our data in the revised version of the manuscript.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescently labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To conclude that cell volume is indeed increased, the investigators should consider staining the cells with markers that demarcate cell boundaries and/or are confined to the cytosol, i.e., a cell tracker dye.

      Staining using our SEPT7 antibody enables us to define cell boundaries for cellular area measurements (Novel Figure 1 - figure supplement 1A). However, we agree with the Reviewer that staining cells with additional markers (such as a cell tracker dye) would be required to conclude that cell volume is increased. We therefore adjust our claims in the main text (lines 107-115 and 235-246).

      (2) Line 27: I understand what is meant by "recruited to actively pathogenic bacteria with increased T3SS activation." However, one could argue that there are many different roles of the intracytosolic bacteria in pathogenesis in terms of pathogenesis, not just actively secreting effectors.

      T3SS secretion by cytosolic bacteria is tightly regulated and both T3SS states (active, inactive) likely contribute to the pathogenic lifestyle of S. flexneri. In agreement with this, we removed this statement from the manuscript (lines 27, 225 and 274).

      (3) Line 88: Please clarify in the text that HeLa cells are being studied.

      We explicitly mention that the epithelial cell line we study is HeLa in the main text (line 93), in addition to the Materials and methods (line 328).

      (4) Line 97: is it possible to quantify the average distance of the nuclei from the cell perimeter? This would help provide some context as to what it means to be a certain distance from the nucleus, i.e., is there another way to point out that distance from nuclei correlates with movement inward post-invasion at the periphery?

      To provide more context to the inward movement of bacteria to the cell centre, we provide calculations based on measurements in Figure 1G, I. If we approximate geometric shape of both cells and nucleus to a circle, the median radius of a HeLa cell is 31.1 ยตm<sup>2</sup> (uninfected cell) and 36.3 ยตm<sup>2</sup> (infected cell). Similarly, the median radius of the nucleus is 22.2 ยตm<sup>2</sup> (uninfected cell) and 24.57 ยตm<sup>2</sup> (infected cell).

      However, we note that Figure 1F shows distance of bacteria to the centroid of the cell, which is the geometric centre of the cell, and which does not necessarily coincide with the geometric centre of the nucleus. We also note that nuclear area increases with infection (in a bacterial dose dependent manner). Finally, we note that these measurements are performed on max projections of 3D Z-stacks. In this case we cannot fully appreciate distance to the nucleus for bacteria located above it.

      (5) Lines 212-213 - there is no Figure 9A, B - I think this should be Figure 7A, B.

      Text has been updated (lines 216-217).

      Reviewer #2 (Recommendations For The Authors):

      Testing the analysis pipeline as a proof-of-concept question such as the comparison of caging around the laboratory strain as compared to one or a few clinical isolates or mutants of interest would help stress the relevance of this new, remarkable tool.

      We thank the Reviewer for their enthusiasm.

      Future research in the Mostowy lab will capitalise on the high-content tools generated here to explore the frequency and heterogeneity of septin cage entrapment for a wide variety of S. flexneri mutants and Shigella clinical isolates.

      The sentence in line 215 ends with "in agreement with" followed by a reference.

      Text has been updated (line 219).

      The sentence in line 217 on the correlation between caging and T3SS is not very clear.

      Text has been clarified (lines 221-223).

      There is a typo in line 219 : "protrusSions"

      Text has been updated (line 223).

      Reviewer #3 (Recommendations For The Authors):

      Major points

      The quantitative analysis approach in Figure 1 has multiple issues. Some examples:<br /> (1) How was the cell area estimated? Normally, a marker for the whole cell (CellMask or similar) or cells expressing GFP would be good indicators. Here it is not clear to me what was done.

      The cell area was estimated using SEPT7 antibody staining which is enriched under the cell cortex. CellProfiler was used to segment cells based on SEPT7 staining, using a propagation method from the identified nucleus based on Otsu thresholding. To provide more clarity on how this was performed, we now include a new figure (Figure 1- figure supplement 1A) showing a representative image of HeLa cells stained with SEPT7 and the corresponding cell segmentation performed with CellProfiler software, together with an updated figure legend explaining the procedure (lines 784โ€“787).

      (2) The authors use Hoechst and integrated z-projections (Figure 1 S1) as a proxy to estimate nuclear volume. Hoechst staining depends on the organization of the DNA within the nucleus and I find that the authors need to do better controls to estimate nuclear size - this would be possible with cells expressing fluorescently labeled histones, or even better with a fluorescently tagged nuclear pore/envelope marker. The current quantification approach is misleading.

      We understand Reviewer #3โ€™s concerns about using Hoechst staining as a proxy of nuclear volume, due to potential differences in DNA organisation within the nucleus.

      Following the recommendation of Reviewer #3 in the following point 3, text has been updated (lines 107โ€“115 and 235-246).

      (3) Was cell density assessed for the measurements? If cells are confluent, bacteria could spread between cells within 3 hrs, if cells are less dense, this does not occur. When epithelial cells are infected for some hours, they have the tendency to round up a bit (and to appear thicker in z), but a bit smaller in xy. My suggestion to the authors (as they use these findings to follow up with experiments on the underlying processes) would be to tone down their statements - eg, Hoechst staining could be simply indicated as altered, but not put in a context of size (this would require substantial control experiments).

      Local cell density was not directly measured, but the experiment was set up to infect at roughly 80% confluency (cells were seeded at 10<sup>4</sup> cells/well 2 days prior to infection in a 96-well microplate, as described in the Materials and methods section) and to ensure bacterial spread between cells.

      In agreement with Reviewer #3 we tone down statements in the main text (see response to point 2 above).

      In addition, I found Figure 1 (and parts of Figure 2) disconnected from the rest of the manuscript, and it may even be an idea to take it out of the manuscript (that could also help to deal with my feedback relating to Figure 1). I would suggest starting the manuscript with the current Figure 3 and building the biological story with a stronger focus on SEPT7 (and its links with T3 secretion and actively pathogenic bacteria) from there on. As it stands, the two parts of the manuscript are not well connected.

      We carefully considered this comment but following revisions we have not reorganised the manuscript. We believe that high-content characterisation of S. flexneri infection in Figure 1 and 2 provides insightful information about changes in host cells in response to infection. Following this, we move onto characterising intracellular bacteria (and in particular those entrapped in septin cages) in the second part of the manuscript (Figure 3-7). Similar methods were used to analyse both host and bacterial cells and results obtained offer complementary views on host-pathogen interactions.

      My major reservation with the experimental work of the current version of the manuscript relates to Figure 5: The analysis of the septin phenotypes in Figure 5 seems to be problematic - to me, it appears that analysis and training were done on projected image stacks. As bacteria are rod-shaped their orientation in space has an enormous impact on how the septin signal appears in a projection - this can lead to wrong interpretation of the phenotypes. The authors need to do some quantitative controls analyzing their data in 3D. To be more clear: the example "tight" (second row) shows a bacterium that appears short. It may be that it's actually longer if one looks in 3D, and the septin signal could possibly fall in the category "rings" or even "two poles".

      The deep learning training and subsequent analysis of septin-cage entrapment is done on projected Z-stacks, which presents limitations. Future work in the Mostowy lab will exploit this first study and dive deeper into 3D aspects of the data.

      To address Reviewer #3โ€™s concern, we include a sentence explaining that this analysis was performed using 2D max projections (lines 708 and 724), as well as acknowledging its limitations in the main text (lines 259-262).

      Minor points

      The scale bar in Fig 1 is very thin.

      We corrected the scale bar in Fig. 1 to make it more visible.

      Could it be that Figure 1F is swapped with Figure1E in the description?

      Descriptions for Figure 1E and F are correct.

      Line 27: what does "actively pathogenic bacteria" mean? I propose to change the term.

      We agree with Reviewer #3 that โ€œactively pathogenic bacteriaโ€ should be removed from the text. This update is also in agreement with Reviewer #1 (see Reviewer #1 point 2).

      Line 28: "dynamics" can be confusing as it relates to dynamic events imaged by time-lapse.

      Although we are making a snapshot of the infection process at 3 hpi, we capture asynchronous processes in both host and bacterial cells (eg. host cells infected with different bacterial loads, bacterial cells undergoing actin polymerisation or septin cage entrapment). We agree that we are not following dynamics of full events over time. However, our high content approach enables us to capture different stages of dynamic processes. To avoid confusion, we replace โ€œdynamicsโ€ by โ€œdiverse interactionsโ€ (line 28), and we discuss the importance of follow-up studies studying microscopy timelapses (line 274).

      Paragraph 59 following: the concept of heterogeneity was investigated in some detail for viral infection by the Pelkmans group (PMID: 19710653) using advanced image analysis tools. Advanced machine-learning-based analysis was then performed on Salmonella invasion by Voznica and colleagues (PMID: 29084895). It would be great to include these somewhat "old" works here as they really paved the way for high-content imaging, and the way analyses were performed then should be also discussed in light of how analyses can be performed now with the approaches developed by the authors.

      We agree. These landmark studies have now been included in the main text (lines 71-74).

      Line 181: I do not know what "morphological conformations" means, perhaps the authors can change the wording or clarify.

      We substituted the phrase โ€œmorphological conformationsโ€ by โ€œmorphological patternsโ€ to improve clarity in the main text (lines 185).

      The authors claim (eg in the abstract) that they are measuring the dynamic infection process. To me, it appears that they look at one time-point, so no dynamic information can be extracted. I suggest that the authors tone down their claims.

      Please note our response above (Minor points, Line 28) which also refers to this question.

    1. Itโ€™s just a colonial-style assumption that if something is available, it mustbe theirs to take

      I feel like this is why it's important to know what is copy-righted online. People think they can take any image or writing, but most of it is protected by the owner. It should be taken more seriously.

    1. Lack of professionaltraining in dealing with students with disability isone of the obstacles to success in education (Imaniah& Fitria, 2018; Mag et al., 2017; Materechera, 2020).

      YES! I have seen this recently in the school I work in. We have general education teachers who have special needs students in their classroom and they are not trained at all in how to include them in the classroom. These teachers either don't think these children can do things other non-disabled students can, or the opposite, they expect the students to perform in other areas like non-disabled students. It's required of the case managers in our district to provide an "IEP At a Glance" to all teachers their SPED students would work with. Teachers are to read through, ask questions, and sign off stating they understand the document and will follow it in their classroom. I have seen so many general education teachers just quickly sign off on the At a Glance not really having looked through it. Part of that is lack of initiative, the other they haven't been trained enough to fully understand the seriousness and how to then implement this in the classroom.

    1. Of course, a critical approach does not make academic work inherentlysuperior to other less critical work. Moreover, critical studies will never bethe most self-affirming approach to take when it comes to researching andwriting about technology and educationโ€”especially when compared to thebreathless adoration of all things digital that media academics all too easilyslip into.

      A critical approach is just that an approach it does mean you can use it but don't slip into a view of self-affirmation which you will not get anyway. It's not glamorous but, it is work necessary to do so we need to ask questions and use these approaches.

    1. For us the general term "rhetorics" refers both to the study of meaning-making systems and to the practices that constitute those systems.

      Rhetoric isnโ€™t just speeches and writing; itโ€™s cultural practice itself. So rhetoric is study and practice. But what are the boundaries of calling practice โ€œrhetoricalโ€?

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      This manuscript described the translational responses to single and combined BCAA shortages in mouse cell lines. Using Ribo-seq and RNA-seq analysis, the authors found selective ribosome pausing at codons that encode the depleted amino acids, where the pausing at valine codons was prominent at both a single and triple starvations whereas isoleucine codons showed pausing only under a single depletion. They analyzed the mechanisms of the unexpected selective pausing and proposed that the positional codon usage bias could shape the ribosome stalling and tRNA charging patterns across different amino acids. They also examined the stress responses and the changes in the protein expression levels under BCAA starvation.

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

      We thank the reviewer for the thoughtful and positive evaluation of our work.

      Major comments

      1. The abstract may need to be revised since it is hard to immediately catch the authors' main point. If the authors regard this work as a resource paper, the current version is fine. But it could be better to point out the positional codon usages the authors found, which is a strong point of the current manuscript.

      Response: We thank the reviewer for highlighting the importance of positional codon usage, which indeed represents a key finding of our study. We revised the abstract, and we now emphasize this aspect more clearly. However, in response to review #2, we have framed the observed positional effects and the idea of an elongation bottleneck as one possible contributing mechanism among others and relate it specifically to the attenuation of isoleucine-specific stalling under triple starvation.

      1. Page 18 "Beyond these tRNA dynamics, our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress." This idea is interesting. To what extent the authors think this could be generalized? The authors may discuss whether they think their proposed model is specific to the different ribosome stalling patterns between valine and isoleucine codons or generalized to other codon combinations. For example, the positional codon usage bias will be different among different organisms, and are there any previous reports on ribosome behaviors that align with their model?

      Response: We thank the reviewer for raising these important points. While our study primarily focuses on the differential stalling patterns of valine and isoleucine codons, we believe the underlying principle, that the position of codons within the CDS can modulate the extent of ribosome stalling, may under very specific circumstances extend beyond this amino acid pair. We expect this positional effect to be potentially relevant for combinations in which one amino acid has considerable enrichment near the 5โ€ฒ end of coding sequences, coupled with starvation-sensitive tRNA isoacceptors, while the other does not. In our case, valine meets these criteria (see Fig. S11A and Fig. 6). In contrast, isoleucine and leucine codons, although also relatively frequent, show more variable positional distributions and are both decoded by isoacceptors that appear more resistant to starvation, as illustrated in Fig. 6 and reported for mammals and bacteria in Saikia et al. 2016; Darnell, Subramaniam, and Oโ€™Shea 2018; Elf et al. 2003; Dittmar et al. 2005. To explore the generalizability of this model, we have now included a transcriptome-wide analysis of codon position biases in mouse for all codons in the revised manuscript (Supplementary Figures 10 and 11). This analysis may serve as a basis to identify additional candidate codons for future studies. Furthermore, we now mention in the Discussion that amino acids with similar properties to valine regarding their positional distribution and tRNA isoacceptors, such as phenylalanine, and glutamine, whose tRNA isoacceptors are predicted to be fully deacylated under their respective starvation in bacteria (Elf et al. 2003), could be promising candidates for testing this model, in combination with amino acids, whose tRNAs are expected to remain partially charged under starvation or to be depleted at the start of the CDS such as i.e. His (Supplementary Fig.11C).

      Even if the authors think this model can be applied to BCAA starvation, would it be possible to explain the different isoleucine codon responses between single and double starvation? The authors may discuss why the ribosome stalling at isoleucine AUU and AUC codons was slightly attenuated under double starvation. And how about the different leucine codon responses among single, double, and triple starvations, although the pausing is not as strong as isoleucine and valine codons?

      Response: Regarding the attenuated isoleucine stalling under double starvation, we believe this is primarily due to stronger inhibition of the mTORC1 pathway when leucine is co-depleted (i.e., in the double starvation condition; Fig. 2Dโ€“F). This results in a more substantial suppression of global translation, reducing overall tRNA demand and thereby mitigating stalling (Darnell, 2018). A similar effect may explain the only mild leucine codon stalling observed under single leucine starvation, which also triggers strong mTORC1 inhibition and reduced initiation. In contrast, triple starvation does not suppress mTORC1 to the same extent, and thus reduced initiation alone cannot explain the absence of leucine codon stalling. Instead, we propose that additional features, such as the relative sensitivity of tRNA isoacceptors to starvation and their aminoacylation dynamics, must be considered. Valine tRNAs, for example, are known to be highly sensitive and become strongly deacylated under starvation in bacteria (Elf et al. 2003), a pattern that we also find in our own data (Fig. 6). Leucine tRNAs, by contrast, appear more resistant, possibly due to better amino acid recycling or isoacceptor-specific differences in charging kinetics, though further validation would be needed. However, combined with the strong stalling at 5โ€ฒ-enriched valine codons, this could reduce downstream ribosome traffic and limit exposure of leucine codons, thus preventing stalling. However, our new analysis of the positional relationship between valine and leucine codons within individual transcripts (now shown in Supplementary Figure 11B) did not reveal as strong a pattern as we observed for valine and isoleucine codons. We now discuss these points and their implications in the revised Discussion.

      Experimental validation using artificial reporters carrying biased sequences may also be considered.

      Response: We appreciate the reviewerโ€™s suggestion. In fact, we explored this experimentally using a dual-fluorescent reporter system (GFPโ€“RFP) (Juszkiewicz and Hegde 2017) containing consecutive Val or Ile codons. However, the constructs yielded variable and non-reproducible results under starvation conditions. In addition, testing the role of codon position would require placing the same codons at multiple defined positions within a single transcript and performing ribosome profiling directly on the reporter. This type of targeted experimental validation is technically challenging and falls beyond the scope of the current study. We now mention this explicitly in the revised Discussion as an interesting direction for future work.

      1. Page 13 "Moreover, we noticed that DT changes extend beyond the ribosomal A-site, including the P-site, E-site, and even further positions (Supplementary Fig. 2A), consistent with other studies on single amino acid starvation 39 (Supplementary Fig. 2B-C)." Could the widespread DT changes be due to Ribo-DT pipeline they used or difficulties in offset determination? Indeed the authors showed that this feature was found in other datasets, but it seems that the datasets were processed and analyzed in the same way as their data. The original Ribo-DT paper (Gobet and Naef, 2022, Methods) also showed some widespread DT changes even from RNA-seq. Another analysis method like the codon subsequence abundant shift as a part of diricore analysis (Loayza-Puch et al., 2016, Nature) did not show that broad changed regions. The authors are encouraged to re-analyze the data sets using different methods.

      Response: We agree with the reviewer that the fact that DT changes beyond the ribosomal A-site is puzzling, but this has already been seen in other papers using other approaches (Darnell, Subramaniam, and Oโ€™Shea 2018). To validate that this shift is not due to our A-site assignment, enrichment analysis, or DT method, we applied the Diricore pipeline to our Ribo-Seq data. The output of the pipeline provides either 5โ€™-end ribosome density or โ€œsubsequenceโ€ analysis using an A-site offset for each read size based on the metagene profile at the start codon. Both analyses show the same enriched codons across the different conditions as in our analyses, and the broad shift is similar, with the maximum signal at E, -1 position (Fig. R1).

      1. Page 13 "Intriguingly, only two of the three isoleucine codons (AUU and AUC) showed increased DTs upon Ile starvation (p < 0.01), while just one leucine codon (CUU) exhibited a modest but significant DT increase (p < 0.01) under Leu starvation (Figure 1A-B, Supplementary Figure 2A)." How can the authors explain the different strengths of ribosome pausing at Ile codons under Ile and double starvation? The AUA codon did not show any pausing under either of the starvation conditions. Throughout the manuscript, the authors mainly describe the difference between amino acids but it is desirable to discuss the codon-level difference as well.

      Response: Thank you for raising this point. The observed differences in stalling between the isoleucine codons can likely be explained by differences in tRNA isoacceptor charging and positional bias within transcripts. The AUA codon is decoded by a distinct tRNAIle isoacceptor (tRNAIleUAU), which, according to our tRNA charging data (Fig. 6), remains largely charged during Ile starvation. This observation aligns with previous reports suggesting that this isoacceptor is more resistant to starvation-induced deacylation in mammalian cells and bacteria (Saikia et al. 2016; Elf et al. 2003). In contrast, the AUU and AUC codons are primarily decoded by the tRNAIleAAU isoacceptor, which we find to be strongly deacylated under Ile starvation, likely contributing to the observed codon-specific ribosome pausing. Additionally, we found that the AUA codons are relatively rare in general and particularly underrepresented near the 5โ€ฒ ends of coding sequences. Our new spatial analysis (now included in Supplementary Figure 11B) confirms that AUA codons tend to occur downstream of AUU and AUC codons within transcripts. This potentially further reduces stalling on these codons and further diminishes their apparent DT increase under starvation. In order to better explain these important points, we have now expanded the codon-level discussion of these differences in the revised manuscript.

      1. Page 13 "We examined the effects of single amino acid starvations (-Leu, -Ile and -Val), as well as combinations, including a double starvation of leucine and isoleucine (hereafter referred to as "double") and a starvation of leucine, isoleucine, and valine ("triple"), allowing us to identify potential non-additive effects." The different double starvations, isoleucine and valine, and leucine and valine, will further support their hypothesis on the effects of the positional codon usage bias on ribosome pausing and tRNA charging patterns. Although this could be beyond the scope of the current manuscript, the authors are encouraged to provide a rationale for the chosen combination.

      Response: Our experimental design evolved stepwise: we initially focused on leucine and isoleucine depletion as we found that despite their structure similarity these had respectively short and long dwell times in our previous work in the mouse liver (Gobet et al. 2020). Valine was included at a later stage to cover all the BCAAs. At the time, we did not anticipate valine to yield particularly striking effects in cells, and therefore we did not include systematic pairwise depletions involving valine. However, the strong and unexpected stalling observed at valine codons, especially under triple starvation, became a central aspect of the study. Thus, we agree that additional combinations, such as Leu/Val or Val/Ile, could be informative and now mention this in the Discussion as a potential direction for future studies.

      Minor comments

      Page 16 "these results imply that BCAA deprivation lowers protein output through multiple pathways: a combination of reduced initiation, direct elongation blocks (stalling), and possibly an increased proteolysis" This conclusion is totally right but may be too general. Could the authors summarize BCAA-specific features of the events including reduced initiation, stalling, and proteolysis that all contribute to protein outputs? This is not well discussed in the latter sections including Discussion.

      Response: We thank the reviewer for this helpful suggestion. We agree that the original statement was too general and have revised the relevant section to more clearly delineate the distinct responses observed under each BCAA starvation condition. Specifically, we now summarize that valine starvation is characterized by strong, positionally biased ribosome stalling; leucine starvation primarily impacts translation initiation, likely via mTORC1 repression; and isoleucine starvation shows a mixed phenotype, with features of both impaired initiation and codon-specific elongation delays. We also clarify that while protein stability or degradation may contribute to the observed changes in protein output, our current data do not allow for quantitative assessment of proteolytic effects (e.g., changes in protein half-life). Therefore, we refrain from making direct quantitative conclusions about the differential modulations of proteolysis and instead focus our discussion on the translational mechanisms supported by our data.

      Reviewer #1 (Significance):

      The manuscript was well-written, and the findings are interesting, especially their model that positional codon usage bias could be a regulator of ribosome pausing and tRNA charging levels. Although different translational responses to distinct amino acid starvation have been widely documented, the positional codon usage bias is an interesting aspect. The manuscript's central message could have been made clearer. The authors may consider emphasizing this point more explicitly in the abstract. The rich multi-omics dataset in this work provides valuable resources for the translation field.

      We thank the reviewer for the encouraging comments and share the view that positional codon-usage bias is an important result; accordingly, we now underscore this point explicitly in the revised Abstract. We also emphasise that our other observations are, to our knowledge, novel: only a handful of multi-omics studies have combined ribosome-pausing profiles with direct tRNA-aminoacylation measurements, and none has systematically examined multiple amino-acid-deprivation conditions as presented here.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This study examines the consequences of starvation for the BRCAAs, either singly, for Leu & Ile, or for all three simultaneously in HeLa cells on overall translation rates, decoding rates at each codon, and on ribosome density, protein expression, and distribution of ribosome stalling events across the CDS for each expressed gene. The single amino acid starvation regimes specifically reduce the cognate intracellular amino acid pool and lead to deacylation of at least a subset of the cognate tRNAs in a manner dependent on continuing protein synthesis. They also induce the ISR equally and decrease bulk protein synthesis equally in a manner that appears to occur largely at the initiation level for -Leu and -Val, judging by the decreased polysome:monsome ratio, but at both the initiation and elongation levels for -Ile-a distinction that remains unexplained. Only -Leu appears to down-regulate mTORC1 and TOP mRNA translation.There is a significant down-regulation of protein levels for 50-200 genes, which tend to be unstable in nutrient-replete cells, only a fraction of which are associated with reduced ribosome occupancies (RPFs measured by Ribo-Seq) on the corresponding mRNAs in the manner expected for reduced initiation, suggesting that delayed elongation is responsible for reduced protein levels for the remaining fraction of genes. All three single starvations lead to increased decoding times for a subset of the cognate "hungry" codons: CUU for -Leu, AUU and AUC for -Ile, and all of the Val codons, in a manner that is said to correspond largely to the particular tRNA isoacceptors that become deacylated, although this correspondence was not explained explicitly and might not be as simple as claimed. All three single starvations also evoke skewing of RPFs towards the 5' ends of many CDSs in a manner correlated with an enrichment within the early regions of the CDSs for one or more of the cognate codons that showed increased decoding times for -Ile (AUC codon) and -Val (GUU, GUC, and GUG), but not for -Leu-of which the latter was not accounted for. These last findings suggest that, at least for -Val and -Ile, delays in decoding N-terminal cognate codons cause elongating ribosomes to build-up early in the CDS. They go on to employ a peak calling algorithm to identify stalling sites in an unbiased way within the CDS, which are greatest in number for -Val, and find that Val codons are enriched in the A-sites (slightly) and adjacent 5' nucleotides (to a greater extent) for -Val starvation; and similarly for Ile codons in -Ile conditions, but not for -Leu starvation-again for unknown reasons. It's unclear why their called stalling sites have various other non-hungry codons present in the A sites with the cognate hungry codons being enriched further upstream, given that stalling should occur with the "hungry" cognate codon in the A site. The proteins showing down-regulation are enriched for stalling sites only in the case of the -Val starvation in the manner expected if stalling is contributing to reduced translation of the corresponding mRNA. It's unclear why this enrichment apparently does not extend to -Ile starvation which shows comparable skewing of RPFs towards the 5'ends, and this fact diminishes the claim that pausing generally contributes to reduced translation for genes with abundant hungry codons. All of the same analyses were carried out for the Double -Ile/-Leu and Triple starvations and yield unexpected results, particularly for the triple starvation wherein decoding times are increased only at Val codons, skewing of RPFs towards the 5' ends of CDSs is correlated only with an enrichment for Val codons within the early regions of the CDSs, and stall sites are enriched only for Val codons at nearly upstream sites, all consistent with the finding that only Val tRNAs become deacylated in the Triple regime. To explain why only Val tRNA charging is reduced despite the observed effective starvation for all three amino acids, they note first that stalling at Val codons is skewed towards the 5'ends of CDS for both -Val and triple starvations more so than observed for Ile or -Leu starvation, which they attribute to a greater frequency of Val codons vs Ile codons in the 5' ends of CDSs. As such, charged Val tRNAs are said to be consumed in translating the 5'ends of CDSs and the resulting stalling prevents ribosomes from reaching downstream Ile and Leu codons at the same frequencies and thus prevents deacylation of the cognate Ile and Leu tRNAs. It's unclear whether this explanation is adequate to explain the complete lack of Ile or Leu tRNA deacylation observed even when amino acid recycling by the proteasome is inhibited-a treatment shown to exacerbate deacylation of cognate tRNAs in the single amino acid starvations and of Val tRNA in the triple starvation. As such, the statement in the Abstract "Notably, we could show that isoleucine starvation-specific stalling largely diminished under triple starvation, likely due to early elongation bottlenecks at valine codons" might be too strong and the word "possibly" would be preferred over "likely". It's also unclear why the proteins that are down-regulated in the triple starvation are not significantly enriched for stalling sites (Fig. 5B) given that the degree of skewing is comparable or greater than for -Val. This last point seems to undermine their conclusion in the Abstract that "that many proteins downregulated under BCAA deprivation harbor stalling sites, suggesting that compromised elongation contributes to decreased protein output." In the case of the double -Ile/-Leu starvation, a related phenomenon occurs wherein decoding rates are decreased for only the AUU Ile codon and only the AAU Ile tRNA becomes deacylated; although in this case increased RPFs in the 5' ends are not correlated with enrichment for Ile or Leu codons and, although not presented, apparently stall sites are not associated with the Ile codon in the double starvation. In addition, stalling sites are not enriched in the proteins down-regulated by the double starvation. Moreover, because Ile codons are not enriched in the 5'ends of CDS, it doesn't seem possible to explain the selective deacylation of the single Ile tRNA observed in the double starvation by the same "bottleneck" mechanism proposed to explain selective deacylation of only Val tRNAs during the triple starvation. This is another reason for questioning their "bottleneck" mechanism.

      We thank the reviewer for their deep assessment, exhaustive reading, and constructive feedback, which have greatly contributed to improving the clarity and contextualization of our manuscript. We would first like to clarify that all experiments in this study were conducted in NIH3T3 mouse fibroblasts, not HeLa cells; we assume this was a misunderstanding and have verified that the correct cell line is consistently indicated throughout the manuscript. We also clarify that our data show that -Leu, double starvation, and to a lesser extent -Ile, downregulate mTORC1 signaling and TOP mRNA translation, whereas valine -Val and triple starvation had minimal effects on these pathways. We agree that some of our conclusions and observed phenomena were not explained in sufficient detail in the original version. To address this, we have significantly reworked the discussion, added complementary figures and clarified key points throughout the text, to better convey the underlying rationale and biological interpretation of our findings. We address each of the reviewerโ€™s points in detail in the point-by-point responses below.

      Specific comments (some of which were mentioned above):

      -The authors have treated cells with CHX in the Ribo-Seq experiments, which has been shown to cause artifacts in determining the locations of ribosome stalling in vivo owing to continued elongation in the presence of CHX (https://doi.org/10.1371/journal.pgen.1005732 ). The authors should comment on whether this artifact could be influencing some of their findings, particular the results in Fig. 5C where the hungry codons are often present upstream of the A sites of called stalling sites in the manner expected if elongation continued slowly following stalling in the presence of CHX.

      Response: We thank the reviewer for raising this important concern. We would like to clarify that our ribosome profiling protocol did not include CHX pretreatment of live cells. CHX was added only during the brief PBS washes immediately before lysis and in the lysis buffer itself. This approach aligns with best practices aimed at minimizing post-lysis ribosome run-off, and is intended to prevent the downstream ribosome displacement artifacts described by Hussmann et al. 2015, which result from pre-incubation of live cells with CHX for several minutes before harvesting. Furthermore, recent studies have demonstrated that CHX-induced biases are species-specific. For instance, Sharma et al. 2021 found that human (and mice) ribosomes are not susceptible to conformational restrictions by CHX, nor does CHX distort gene-level measurements of ribosome occupancy. This suggests that the use of CHX in the lysis buffer, as performed in our protocol, is unlikely to introduce significant artifacts in our ribosome profiling data. To further support this, we reanalyzed data from Darnell, Subramaniam, and Oโ€™Shea 2018, where the ribosome profiling samples were prepared without any CHX pretreatment or CHX in the wash buffer, and still observed similar upstream enrichments in their stalling profiles (see Supplementary Figure 2B-C in our manuscript). Additionally, in our previous work (Gobet et al. 2020), we compared ribosome dwell times with and without CHX in the lysis buffer and found no significant differences, reinforcing the notion that CHX use during lysis does not substantially affect the measurement of ribosome stalling. Given these considerations, we believe that CHX-related artifacts, such as downstream ribosome movement, are unlikely to explain the enrichment of hungry codons upstream of identified stalling sites in our data. We have now adjusted the Methods section to clarify this point.

      -p. 12: "These starvation-specific DT and ribosome density modulations were also evident at the individual transcript level, as exemplified by Col1a1, Col1a2, Aars, and Mki67 which showed persistent Val-codon-specific ribosome density increases but lost Ile-codon-specific increases under triple starvation (Supplementary Figure 3A-D). " This conclusion is hard to visualize for any but Val codons. It would help to annotate the relevant peaks of interest for -Ile starvation with arrows.

      Response: We agree and thank the reviewer for this observation. We have now annotated exemplary peaks in Supplementary Figure 3Aโ€“D to highlight ribosome pileups over Ile codons. However, we agree that it is still hard to visualize in the given Figure. Therefore, we added scatter plots for each of the transcripts that show the RPM of each position in the Ctrl vs starvation to allow for a better illustration of the milder effects upon Ile starvation (Supplementary Figure 4).

      -To better make the point that codon-specific stalling under BCAA starvation appears to be not driven by codon usage, rather than the analysis in Fig. 1H, wouldn't it be better to examine the correlation between increases in DT under the single amino acid starvation conditions and the codon frequencies across all codons?

      Response: We appreciate the suggestion. We have now added an additional analysis correlating the change in DT with codon usage frequency for each starvation condition. This is included in Supplementary Figure 5A-D and supports our interpretation that codon frequency alone does not explain the observed stalling behavior.

      -p. 13, entire paragraph beginning with "Our RNA-seq and Ribo-seq revealed a general activation of stress response pathways across all starvations..." It is difficult to glean any important conclusions from this lengthy analysis, and the results do not appear to be connected to the overall topic of the study. If there are important conclusions here that relate to the major findings then these connections should be made or noted later in the Discussion. If not, perhaps the analysis should be largely relegated to the Supplemental material.

      Response: We thank the reviewer for this comment. The paragraph in question is intended to provide a global overview of transcriptional and translational responses across the starvation conditions. It serves both as a quality control (e.g., PCA clustering and global shifts in RPF/RNA-seq profiles), and to confirm that expected starvation-induced responses are among the strongest detectable signals separating the starved samples from the control. Indeed, these observations establish that the perturbations are effective and that hallmark nutrient stress responses are globally engaged across conditions. Importantly, very few studies to date have examined transcriptional and translational responses under single or combined branched-chain amino acid (BCAA) starvation conditions. It therefore remains unclear to what extent BCAA depletion broadly remodels gene expression and translation. Our analysis contributes to addressing this gap, revealing that while certain stress pathways are commonly induced, others show condition-specific patterns such as we observed for -Ile starvation. To maintain focus, we have kept the detailed pathway analyses and transcript-level enrichments in the Supplement and rewritten the corresponding text in a more compact manner, reducing it by more than one third.

      -p. 15: "Together, these findings highlight that BCAA starvation triggers a combination of effects on initiation and elongation, with varying dynamics by amino acid starvation." I take issue with this statement as it appears that translation is reduced primarily at the initiation step for all conditions except -Ile. As noted above, these data are never menitioned in the DISCUSSION as to why only -Ile would show a marked elongation component to the inhibition whereas -Val gives the greatest amount of ribosome stalling.

      Response: We acknowledge the reviewerโ€™s point. While the polysome profiles (Figure 3F-H) directly indicate that most conditions repress initiation, codon- and condition-specific elongation defects can still contribute to reduced protein output, even if they are not always detectable as global polysome shifts. Polysome profiles reflect the combined outcome of reduced initiation (which decreases polysome numbers) and ribosome stalling (which can, but does not always have to, increase ribosome density on individual transcripts, potentially counteracting the effects of reduced initiation). For valine starvation strong stalling occurs very early in the CDS (Figure 5F). This bottleneck restricts overall ribosome movement to downstream regions. Thus, while elongation is profoundly impaired, the total number of ribosomes per transcript (which polysome signals largely reflect) may appear low due to reduced overall ribosome traffic. In contrast, isoleucine codon stalling tends to occur also further downstream on the transcript (Figure 5F), allowing ribosomes to accumulate in larger numbers on the mRNA, leading to a clearer "elongation signature" in polysome profiles (Figure 3F, H). Additionally, we observed slightly higher inter-replicate variance for isoleucine starvation (Supplementary Figure 6B), which may have reduced the number of statistically significant stalling sites extracted compared to valine. We have revised the main text and discussion to clarify these points.

      -I cannot decipher Fig. 4D and more detail is required to indicate the identity of each column of data.

      Response: We thank the reviewer for pointing this out. Figure 4D (now Figure 4E) presents an UpSet plot, which is a scalable alternative to Venn diagrams commonly used to visualize intersections across multiple sets. Briefly, each bar in the upper plot represents the number of transcripts with increased 5โ€ฒ ribosome coverage (ฮ”pi < -0.15; p < 0.05) shared across the conditions indicated in the dot matrix below. Each column in the dot matrix highlights the specific combination of conditions contributing to a given intersection (e.g., dots under โ€œValโ€ and โ€œTripleโ€ show the overlap between these two). To improve clarity, we have expanded the figure legend accordingly and now refer to the UpSetR methodology in the main text.

      -In Fig. 4E, one cannot determine what the P values actually are, which should be provided in the legend to confirm statistical significance.

      Response: Thank you for pointing that out. The legend in Figure 4E (now Figure 4F) for the p-values was accidentally removed during figure editing. We have added the legend back, so that the statistical significance is clear.

      -It's difficult to understand how the -Leu condition and the Double starvation can produce polarized RPFs (Fig. 4A) without evidence of stalling at the cognate hungry codons (Fig. 4E), despite showing later in Fig. 5A that the numbers of stall sites are comparable in those cases to that found for -Ile.

      Response: We appreciate this comment, which points to an important property of RPF profiles under nutrient stress. As shown in Figure 4A, all starvation conditions induce a degree of 5โ€ฒ ribosome footprint polarization, a pattern that can be observed under various stress conditions and perturbations (Allen et al. 2021; Hwang and Buskirk 2017; Li et al. 2023). This general 5โ€ฒ bias likely reflects a combination of slowed elongation and altered ribosome dynamics and is not necessarily linked to codon-specific stalling. However, Val and Triple starvation show a much stronger and more asymmetric polarization, characterized by pronounced 5โ€ฒ accumulation and 3โ€ฒ depletion of ribosome density. To better illustrate this, we have updated the visualization of polarity scores and added a new bar chart summarizing the number of transcripts showing strong 5โ€ฒ polarization under each condition. This quantification highlights that the effect is markedly more prevalent under Val and Triple conditions than under Leu or Double starvation. In addition, Figure 4F demonstrates that this polarity is codon-specific under Val and Triple starvation. We clarify that this analysis tests for enrichment of specific codons near the start codon among the polarized transcripts and does not directly assess stalling. The observed enrichment of Val codons in the 5โ€ฒ regions of polarized transcripts supports the interpretation that early elongation delays contribute to the RPF shift. In contrast, no such enrichment is observed for Leu starvation, reinforcing that Leu-induced polarity is not driven by stalling at Leu codons. While Figure 5 shows a similar number of peak-called stalling sites in -Leu, -Ile, and Double starvation, we note that Ribo-seq signal variability under Ile starvation was higher, which may have limited statistical power for detecting stalling sites, even though clear dwell time increases were observed at specific codons. Additionally, we have improved the metagene plots depicting total ribosome footprint density in Figure 4A. The previous version incorrectly showed sharp drops at CDS boundaries due to binning artifacts. The updated version more accurately reflects the density distribution and further highlights the stronger polarization in Val and Triple conditions. Together, these clarifications and improvements within the main text now more clearly distinguish between general polarity effects and codon-specific stalling.

      -Fig. 5B: the P values should be given for all five columns, and it should be explained here or in the Discussion why the authors conclude that stalling is an important determinant for reduced translation when a significant correlation seems to exist only for the -Val condition and not even for the Triple condition.

      Response: We thank the reviewer for this important observation. In response, we have revised both the text and the figures to provide a clearer and biologically more meaningful representation of the relationship between ribosome stalling and reduced protein output. Specifically, we have replaced the previous Figure 5B with a new analysis that stratifies transcripts based on the number of identified stalling sites. This updated analysis, now shown in Figure 5B, reveals that under Val and Triple starvation conditions, proteins that are downregulated tend to originate from transcripts with multiple stalling sites. Importantly, the corresponding p-values for all five conditions are now explicitly shown in the figure (as red lines). As the reviewer correctly notes, only the Val condition shows a statistically significant enrichment when considering overall overlap. Triple starvation shows a similarly high proportion of overlap (72.3%) but does not reach statistical significance, likely due to the more complex background composition under combined starvation, which increases the expected overlap and reduces statistical power. By stratifying transcripts by the number of stalling sites, we uncover that transcripts with โ‰ฅ2 stalling sites are enriched among downregulated proteins specifically under Val and Triple conditions, providing a more robust indication of the link between stalling and translation repression under Valine deprivations. We believe this refined approach, prompted by the reviewerโ€™s comment, offers a clearer and biologically more relevant perspective on the role of ribosome stalling. The original analysis previously shown in Figure 5B is now provided as Supplemental Figure 10C for transparency and comparison. We have clarified this in the revised text and now interpret the relationship more cautiously.

      -p. 17: "Of note, in cases where valine or isoleucine codons were present just upstream (rather than at) the stalling position, we noted a strong bias for GAG (E), GAA (E), GAU (D), GAC (D), AAG (K), CAG (Q), GUG (V) and GGA (G) (Val starvation) and AAC (N), GAC (D), CUG (L), GAG (E), GCC (A), CAG (Q), GAA (E) and AAG (K) (Ile starvation) at the stalling site (Supplementary Figure 7B)." The authors fail to explain why these codons would be present in the A sites at stalling sites rather than the hungry codons themselves, especially since it is the decoding times of the hungry codons that are increased according to Fig. 1A-E. As suggested above, is this a CHX artifact?

      Response: We agree that the observation that the listed codons are enriched at identified stalling positions (now Supplementary Figure 10C), while the depleted amino acid codon is located upstream, is a finding that needs more detailed explanation. Importantly, this phenomenon is not attributable to CHX artifacts, as our Ribo-seq protocol employs CHX solely during brief washes and lysis to prevent post-lysis ribosome run-off, rather than live-cell pre-treatment. Instead, we propose two hypotheses to explain this pattern: Firstly, many of these enriched codons are already inherently slow-decoded with longer DTs even under control conditions (Supplementary Figure 5H, newly added). Together with the upstream hungry codons they might form a challenging consecutive decoding environment, which results in an attenuated ribosome slowdown downstream after the hungry codon. Second, ribosome queuing may further explain this pattern. When a ribosome encounters a critically hungry codon and stalls, subsequent ribosomes can form a queue. The codon within the A-site of the queued ribosome would be (more or less) independent of the identity of the hungry codon itself that caused the initial stall. Since the listed codons have a high frequency within the transcriptome (Supp. Fig 5B), they therefore have an increased likelihood of appearing at this โ€œstalling siteโ€. Importantly, both of these phenomena are not necessarily represented by a general increase of DT on all of the listed codons and would therefore only be captured by the direct extraction of stalling sites but might be averaged out in the global dwell time analysis. We mention this phenomenon now in the Discussion.

      -Fig. 5D: P values for the significance, or lack thereof, of the different overlaps should be provided.

      Response: Thanks for pointing out this omission. We have now computed hypergeometric p-values for comparisons shown in Figure 5D and Figure 5E, and report them directly in the main text. As described, the overlap in stalling sites between Val and triple starvation is highly significant (2522 positions, pโ€ฏ<โ€ฏ2.2ร—10โปยนโถ), while overlaps involving Ile-specific stalling positions are smaller but still statistically robust (e.g., 149 positions for Ile โ€“ Triple, pโ€ฏ=โ€ฏ1.77ร—10โปโตยฒ). Notably, we also calculated p-values at the transcript level and found that a large fraction of transcripts with Ile-specific stalling under single starvation also stall under triple starvation, though often at different positions (1806 transcripts, pโ€ฏ=โ€ฏ1.78ร—10โปโตโธ). These values are now included in the revised results section to support the interpretation of these overlaps.

      -p. 17: "Nonetheless, when we examined entire transcripts rather than single positions, many transcripts that exhibited isoleucine-related stalling under Ile starvation also stalled under triple starvation, but at different sites along the CDS (Figure 5E). This finding is particularly intriguing, as it suggests that while Ile-starvation-specific stalling sites may shift under triple starvation, the overall tendency of these transcripts to stall remains." The authors never come back to account for this unexpected result.

      Response: Thank you for highlighting this point. We've incorporated this finding as part of the proposed "bottleneck" scenario. While the isoleucine-specific stalling sites identified under Ile starvation do shift or disappear under triple starvation, we've observed that the same transcripts still tend to exhibit stalling. However, this now primarily occurs at upstream valine codons. We interpret this as a consequence of early elongation stalling caused by strong pausing at Val codons. This restriction on ribosome progression effectively prevents ribosomes from reaching the original Ile stalling sites. Therefore, the stalling sites identified under triple starvation are largely explained by the Val codons, reflecting a redistribution of stalling rather than its loss. To further clarify this crucial point, we've now explicitly mentioned Figure 5D-E again in the subsequent paragraph, which introduces the bottleneck theory.

      -It seems very difficult to reconcile the results in Fig. 5F with those in Fig. 4A, where similar polarities in RPFs are observed for -Ile and -Val in Fig, 4A but dramatically different distributions of stalling sites in Fig. 5F. More discussion of these discrepancies is required.

      Response: Thank you for pointing this out. The apparent discrepancy between the RPF profiles shown in Figure 4A and the stalling site distributions in Figure 5F likely reflects the fact that RPF polarization includes both general (unspecific) and codon-specific components. Figure 4A displays total ribosome footprint density, capturing both broad stress-induced effects and codon-specific contributions, whereas Figure 5F focuses specifically on peak-called stalling sites, representing localized and statistically significant pauses. Importantly, we would like to emphasise that Fig 4 shows that -Val and -Ile starvation exhibit different responses and not the same patterns. To make these differences even clearer, we have now updated the visualizations in Figure 4, including improved polarity plots and a new bar chart summarizing the number of transcripts with strong 5โ€ฒ polarization. These additions highlight that the RPF profiles under -Val starvation are more pronounced and asymmetric, particularly due to 3โ€ฒ depletion, while the polarity under -Ile is milder and a distinct, much smaller subset of transcripts appears to show polarity score shifts. We believe the updated figures and accompanying explanations now make these distinctions clearer.

      • p. 18: " These isoacceptor-specific patterns correlate largely with the particular subsets of leucine and isoleucine codons that stalled (Figure 1A)." This correlation needs to be addressed for each codon-anticodon pair for all of the codons showing stalling in Fig. 1A.

      Response: We thank the reviewer for this important comment. In the revised manuscript, we have expanded the relevant sections to address codonโ€“anticodon relationships more thoroughly. We now explicitly match codons that exhibited increased dwell times under starvation to the corresponding tRNA isoacceptors whose charging was affected, and we provide a clearer discussion of the caveats involved. As noted by the reviewer, this correlation is not straightforward, as it is complicated by wobble base pairing, anticodon modifications, and the fact that multiple codons can be decoded by more than one isoacceptor, and vice versa. Moreover, in our qPCR-based tRNA charging assay, certain isoacceptors cannot be distinguished due to highly similar sequences (e.g., LeuAAG and LeuUAG, and LeuCAA and LeuCAG), which limits resolution for exact pairing. In addition, we did not assess absolute tRNA abundance, which may further influence decoding capacity. Nevertheless, where resolution is possible, the patterns align well: All tRNAVal isoacceptors became uncharged under Val and triple starvation, matching the consistent dwell time increases across all Val codons. Only tRNAIleAAU (decoding AUU and AUC) was deacylated, matching to these codons showing increased dwell times, while AUA (decoded by still-charged tRNAIleUAU) did not. Only CUU (decoded by uncharged tRNALeuGAA) showed increased dwell time. A mild deacylation of the other Leu isoacceptors was observed, but isoacceptor-level resolution is limited by assay constraints. However, these rather minimal tRNA and DT changes were consistent with more dominant initiation repression rather than elongation stalls. To support this analysis, we included an illustrative figure (now in Supplementary Figure 12F) summarizing the codonโ€“anticodon matches.

      -p. 19: "For instance, in our double starvation condition, unchanged tRNA charging levels (Figure 6E) may result from a pronounced downregulation of global translation initiation, likely driven by the activation of stress responses (Figure 2), subsequently lowering the demand for charged tRNAs as it has been observed previously for Leu starvation 39.โ€ This seems at odds with the comparable down-regulation of protein synthesis for the Double starvation and -Leu and -Ile single starvations shown in Fig. 3C. Also, in the current study, Leu starvation does lower charging of certain Leu tRNAs.

      Response: We thank the reviewer for raising this important point. In the revised manuscript, we have clarified this section and now offer a more refined interpretation of the tRNA charging patterns observed under double starvation. While Figure 3C shows a comparable reduction in global protein synthesis across the -Leu, -Ile, and double starvation conditions, it needs to be considered that the OPP assay has limited sensitivity. It operates in a relatively low fluorescence intensity range and is subject to background signal, which may obscure subtle differences between conditions. Moreover, other factors such as changes in protein stability or turnover could also contribute to the observed differences. Therefore, inter-condition differences in translation repression should be interpreted with caution. However, based on our stress response analysis (Figure 2), mTORC1 inactivation appears strongest under double starvation, likely leading to more profound suppression of translation initiation. This would reduce the overall demand for charged tRNAs and could explain why no detectable tRNA deacylation was observed under double starvation, even though mild uncharging of Leu isoacceptors occurred under -Leu, which exhibited a milder stress response. This distinction is consistent with the observed mild dwell time increases for one Leu codon under -Leu, but not in the double condition. Similarly, the absence of Ile codon stalling and tRNA deacylation under double starvation may be attributed to stress-driven reductions in elongation demand, preventing the tRNA depletion and codon-specific delays observed under single Ile starvation. A more direct clarification is now included in the revised manuscript.

      Reviewer #2 (Significance):

      The results here are significant in showing that starvation for a single amino acid does not lead to deacylation of all isoacceptors for that amino acid and in revealing that starvation for one amino acid can prevent deacylation of tRNAs for other amino acids, as shown most dramatically for the selective deacylation of only Val tRNAs in the triple BRCAA starvation condition. For the various reasons indicated above, however, I'm not convinced that their "bottleneck" mechanism is adequate to explain this phenomenon, especially in the case of the selective deacylation of Ile vs Leu tRNA in the Double starvation regime. It's also significant that deacylation leads to ribosome build-up near the 5'ends of CDS, which seems to be associated with an enrichment for the hungry codons in the case of Val and Ile starvation, but inexplicably, not for Leu or the Double starvations. This last discrepancy makes it hard to understand how the -Leu and Double starvations produce RPF buildups near the 5 ends of CDSs. In addition, the claim in the Discussion that "our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress" overstates the strength of evidence that the stalling events lead to substantial decreases in translational efficiencies for the affected mRNAs, as the stalling frequency and decreased protein output are significantly correlated only for the -Val starvation, and the data in Fig. 3 D-H suggest that the reductions in protein synthesis generally occur at the level of initiation, even for -Val starvation, with a contribution from slow elongation only for -Ile-which is in itself difficult to understand considering that stalling frequencies are highest in -Val. Thus, while many of the results are very intriguing and will be of considerable interest to the translation field, it is my opinion that a number of results have been overinterpreted and that important inconsistencies and complexities have been overlooked in concluding that a significant component of the translational inhibition arises from the increased decoding times at hungry codons during elongation and that the selective deacylation of Val tRNAs in the Triple starvation can be explained by the "bottleneck" mechanism. The complexities and limitations of the data and their intepretations should be discussed much more thoroughly in the Discussion, which currently is devoted mostly to other phenomena often of tangential importance to the current findings. A suitably revised manuscript would clearly state the limitations and caveats of the proposed mechanisms and consider other possible explanations as well.

      Again, we thank the reviewer for the valuable insights and constructive critiques. We believe that the concerns regarding potential overinterpretation and inconsistencies have now been addressed through clearer explanations and more cautious interpretation throughout the revised manuscript. We also agree that the original Discussion included aspects that, while interesting, were of secondary importance. In light of the reviewerโ€™s suggestions, we have restructured and rebalanced the Discussion to focus more directly on the key findings and their implications. Importantly, we wish to clarify that we do not propose the elongation bottleneck model as a general mechanism across all conditions. In particular, for double (Leu/Ile) starvation, we attribute the observed effects primarily to stress responseโ€“mediated translational repression, and not to codon-specific stalling or tRNA depletion. We believe that this distinction is now more clearly conveyed in the revised manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      Worpenberg and colleagues investigated the translational consequences of branched-chain amino acid (BCAA) starvation in mouse cells. Limitation of individual BCAAs has been reported to cause codon-specific and global translational repression. In this paper, the authors use RNA-seq, ribosome profiling (Ribo-seq), proteomics, and tRNA charging assays to characterize the impacts of individual and combined depletion of leucine, isoleucine, and valine on translation. They find that BCAA starvation increases codon-specific ribosome dwell times, activates global translational stress responses and reduces global protein synthesis. They infer that this effect is due to decreased translation initiation and codon-specific translational stalling. They find that the effects of simultaneous depletion are non-additive. In valine and triple (valine, leucine, and isoleucine) depletion, they show that affected transcripts have a high density of valine codons early in their coding sequences, creating an "elongation bottleneck" that obscures the impact of starvation of other amino acids. Finally, they identify isoacceptor-specific differences in tRNA charging that help explain the codon-specific effects that they observe.

      We find the major findings convincing and clear. We find that some results are incompletely explained. We suggest an additional experiment and also have some minor comments that we hope will improve clarity and rigor.

      We thank the reviewer for the thorough and constructive feedback. We appreciate the recognition of our main findings and the helpful suggestions for improving the manuscript. Below we address each point in detail.

      Major comments

      Figure 3O: In this figure and the associated text, the authors try to determine whether differences in protein degradation can explain why some proteins have higher ribosome density but lower proteomic expression. However, since this analysis relies on published protein half-lives from non-starvation conditions and on the assumption that protein synthesis has entirely stopped, we are not convinced it is informative for this experimental context. It does not distinguish between a model in which protein synthesis has been reduced by stalling and a model in which both protein synthesis and degradation rate have increased, which are both consistent with their Ribo-seq and proteomic data. To address this issue, the authors should either perform protein half-life measurements under their starvation conditions, or more clearly explain these two models in the text and acknowledge that they cannot distinguish between them.

      Response: We agree with the reviewer that our current analysis, which is based on protein half-lives obtained under non-starvation conditions, can not definitively separate the effects of reduced translation from those of increased protein degradation. We have revised the relevant section in the manuscript to more clearly state that this analysis is correlative in nature and serves only to explore one possible explanation for the observed disconnect between ribosome density and protein levels. We now also explicitly acknowledge that our dataset does not allow us to distinguish between a model in which protein output is reduced due to stalling and one in which both translation and degradation rates are altered. However, the observed log2FC in the proteomics data are often milder than expected based on complete-medium condition half-life alone, which would be difficult to reconcile with a dominant contribution from global protein destabilization. That said, we also acknowledge that protein degradation is highly context- and protein-specific, and that proteolytic regulation might still play a role. Performing a direct protein half-life measurement under our starvation conditions would indeed be required to rigorously test this, but such an experiment is outside the scope of this study. We now highlight this as a limitation and a valuable direction for future work, and we have softened any interpretations in the main text to reflect the uncertainty regarding the contribution of protein stability changes.

      Minor comments

      Figure 1G: Why does intracellular valine seem to be less depleted under starvation conditions than intracellular leucine or isoleucine? Are the limits of detections different for different amino acids? The authors should acknowledge this discrepancy and comment on whether it has any implications for interpretation of their results.

      Response: We thank the reviewer for this important point. While valine appears slightly less depleted than leucine or isoleucine in Figure 1G, the fold changes and absolute reductions are strong for all three BCAAs, including valine. To further illustrate this, we have added a supplementary bar chart showing the measured intracellular concentrations in ยตmol/L, including mean and variance across five biological replicates (Supplementary Figure 5A). We believe that the variation may reflect technical factors, such as differences in detection sensitivity or ionization efficiency between amino acids in the targeted metabolomics assay and, therefore, that the observed difference does not have a meaningful impact on the interpretation of our results. We now directly acknowledge these differences in the main text.

      Figure 1H: These data do not appear to meet the assumptions for linear regression. We suggest either reporting a Spearman R correlation (as the data appears linear in rank but not absolute value), or remove it entirely - we think the plot without statistics is sufficient.

      Response: We thank the reviewer for the suggestion. In the revised manuscript, we removed the statistical annotation and retained only the trend line to illustrate the general pattern. We agree that this visualization alone is sufficient to support the qualitative point we aimed to convey.

      Figure 2B: The in-text description of this figure states that "most" ISR genes show a "robust induction," but only three genes are shown in the figure, two of which are upregulated. The authors should instead specify that 2 out of the 3 genes profiled were robustly induced.

      Response: We have rephrased the sentence to say โ€œtwo of the three genes profiledโ€ฆโ€ for precision and consistency with the data shown.

      Figure 2D: Please include the full, uncropped blots in the supplementary materials.

      Response: We have now added the full, uncropped western blots to the supplementary material (Supplementary Figure 8).

      Figure 2E: Swap the positions of the RPS6 and 4E-BP1 plots so they line up with their respective blots to make these figures easier to interpret. Authors should consider doing a one-way ANOVA and post-hoc analysis, if we correctly understand that they are making a conclusion about the difference between multiple groups in aggregate.

      Response: We thank the reviewer for the suggestion. The alignment of the RPS6 and 4E-BP1 plots with their respective blots has been corrected. As this panel focuses on comparisons to the control condition only, we have retained the original presentation.

      Figure 4B: Panel A in this figure is very convincing, and these plots don't add additional information. The authors could consider removing them. If this panel stays in, we suggest removing the "mid index" plot, since it is never referenced in the text and doesn't seem relevant to the message of the figure.

      Response: We appreciate the feedback. While we considered removing panel B as suggested, we decided to retain it because it provides a useful summary of panel A. To improve clarity and visual interpretation, we replaced the original boxplot with a bar plot displaying mean values and SEM error bars. We believe the bar plot now nicely illustrates that Val and Triple starvation lead to stronger effects, especially in the reduction of the 3โ€ฒ index. The โ€œmid indexโ€ plot, which was not referenced in the text and did not contribute to the central message, has been removed as suggested.

      Figure 4E: Why is there a reduction in frequency of a Leu and a Val codon under Ile starvation?

      Response: Thank you for highlighting this observation. The reduction in the frequency of a specific Leu and Val codon under Ile starvation in Figure 4F (former Figure 4E) is indeed intriguing. This figure reflects codon usage in the first 20% of the CDSs among the subset of transcripts that exhibit a footprint polarization under each starvation condition. As such, the observed depletion likely arises from the specific transcript composition of the polarized subset under -Ile, which differs from that under -Val or other conditions. Importantly, this pattern is not consistently observed when analyzing the full transcripts (another Leu codon is affected), indicating that it is not a systematic depletion of these codons. One possibility is that an increased frequency of Ile codons (AUC) within the constrained region may lead to a relative underrepresentation of other codons, such as Leu and Val. Alternatively, this may reflect non-random codon co-occurrence patterns within specific transcripts. While our current data do not allow us to investigate this further, we acknowledge these as speculative explanations and now mention this point in the Discussion as a potential avenue for future study.

      Figure 5G: There appears to be one Val codon early in the Hint1 transcript without much stalling under triple or valine starvation conditions. The authors should acknowledge this and comment on why this may be.

      Response: We thank the reviewer for pointing this out. While the Hint1 transcript indeed contains a valine codon early in its CDS, no clear stalling peak was observed at that position under valine or triple starvation. Several factors may contribute to this: local sequence context can influence ribosome pausing, and not all cognate codons necessarily lead to detectable stalling even under amino acid starvation. Additionally, coverage at the 5โ€ฒ end of Hint1 is relatively sparse in our dataset, and potential mappability limitations, such as regions with low complexity or repetitive elements, may further reduce resolution at specific sites. We now briefly mention this in the manuscript to clarify the possible causes.

      Figure 5B: In the text referencing this figure, the authors state that "a high number of downregulated proteins with associated ribosome stalling sites did not show an overall decreased mean RPF count...as it would be expected from translation initiation defects, linking these stalling sites directly to proteomic changes." However, RPF is affected both by stalling (increases RPF) and initiation defects (decreases RPF). A gene with both stalling and decreased initiation may appear to have no RPF change. The data does suggest a contribution from stalling, but the authors should also acknowledge that reduced initiation may also be playing a role.

      Response: We agree with the reviewer comment. Our cited statement should indeed be more nuanced. The reviewer correctly points out that RPFs are influenced by both increased ribosome density due to stalling and decreased ribosome density due to reduced initiation. Therefore, a gene experiencing both stalling and reduced initiation might appear to have no net change in RPF, or even a slight increase if stalling is dominant. Thus, while the presence of stalling sites strongly suggests a contribution from compromised elongation to reduced protein output, we cannot definitively rule out a concurrent role for reduced initiation, even in cases where RPF counts are not globally decreased. We revised this section in the manuscript to acknowledge this interplay.

      Figure 5E: the black text on dark brown in the center of the Venn diagram is difficult to read. The diagram should either have a different color scheme, or the text in the center should be white instead of black for higher contrast.

      Response: We have adjusted the text color for better contrast and improved readability.

      Supplementary Figure 1C: The ribosome dwell time data in this study is described as "highly correlated" with another published dwell time dataset, but the P and E site data do not seem strongly correlated. The authors should remove the word "highly."

      Response: We have removed the word โ€œhighlyโ€ to have a more cautious interpretation in the text.

      Supplementary Figure 3E: Not all of the highlighted codons in this figure are ones with prolonged dwell times. To clarify the point that dwell time change is not related to codon frequency, this figure should only highlight codons that have a significantly prolonged dwell time in at least one starvation condition.

      Response: We thank the reviewer for pointing this out. To improve clarity, we have revised the figure and now specifically highlight codons with significantly prolonged dwell times with stars.

      Supplementary Figure 5C: The gene Chop is mentioned in the main text when referencing this figure, but is absent from the heatmap.

      Response: We thank the reviewer for noting this. The gene Chop is annotated under its alternative name Ddit3 in the current version of the heatmap and is indeed present. To avoid confusion, we have now updated the label in the figure to display Chop (Ddit3) directly.

      Supplementary Figure 7A: The authors could clarify this figure by adding additional language to either the figure panel or the figure legend specifying that the RPM metric being used comes from Ribo-seq.

      Response: We have updated the legend to explicitly state that the RPM values shown are derived from Ribo-seq data.

      Supplementary Figure 7D: The metric used to describe the spatial relationship between the first valine and isoleucine codons in transcripts in this figure seems to be describing something conceptually similar to the stalling sites in Figure 5G, but uses a different metric. These figures would be easier to interpret if these spatial relationships were presented in a consistent way throughout the manuscript.

      Response: We thank the reviewer for this helpful observation. Supplementary Figure 7D (now Supplementary Figure 11B) originally used a gene-length-normalized metric to describe codon spacing, whereas Figure 5G depicted absolute nucleotide distances to stalling sites. To ensure consistency across the manuscript, we have now updated Supplementary Figure 11B to also use absolute distances. We believe this adjustment improves clarity and allows for a more direct comparison between spatial codon patterns and stalling events.

      Discussion:

      Reader understanding would be improved if the relevance of paragraphs were established in the first sentence. For instance, in the paragraphs about adaptive misacylation and posttranscriptional modifications, it is unclear until the end of the paragraph how these topics are relevant. Introducing the relevant aspects of the study (the fact that some starvation conditions have less severe effects and the observation about m6A-related mRNAs) at the beginning of these paragraphs would improve clarity.

      Response: We thank the reviewer for this helpful comment. We agree that the flow and clarity of the Discussion can be improved by making the relevance of each paragraph clearer from the outset. In the revised manuscript, we have restructured these sections to better highlight the connection between each topic and our main findings. These changes also align with suggestions from Reviewer 2, and we believe they help to focus the Discussion more tightly around the core insights of our study.

      The authors should provide more information and speculation about possible physiological relevance of their findings, particularly about the way that the effects of triple starvation are highly valine-dependent. Are there physiological conditions under which starvation of all three BCAAs is more likely than starvation of one or two of them? If so, are there any reasons why a valine-based bottleneck might be advantageous?

      Response: We appreciate the reviewer's insightful question regarding the physiological relevance of our findings, particularly the valine-dependent bottleneck observed under triple BCAA starvation. This prompts a crucial discussion on the broader biological context of our work.

      While complete starvation of all three BCAAs might be less frequent than individual deficiencies, such conditions are physiologically relevant in several contexts. In prolonged fasting, starvation, or severe cachectic states associated with chronic diseases (e.g., advanced cancer, critical illness), systemic amino acid pools, including BCAAs, can become significantly depleted due to increased catabolism and insufficient intake (Yu et al. 2021). Moreover, certain specialized diets or therapeutic strategies aim to modulate BCAA levels. For instance, in some Maple Syrup Urine Disease (MSUD) management protocols, BCAA intake is severely restricted to prevent the accumulation of toxic BCAA metabolites (Mann et al. 2021). Similarly, emerging cancer therapies sometimes explore nutrient deprivation strategies to selectively target tumor cells, which could involve broad BCAA reduction (e.g. Sheen et al. 2011; Xiao et al. 2016).

      In these contexts, a valine-based bottleneck, as we describe, could indeed represent an adaptive strategy. If valine-tRNAs are particularly susceptible to deacylation and valine codons are strategically enriched at the 5' end of transcripts, stalling at these early positions could serve as a rapid "gatekeeper" for global translation. This early-stage inhibition would conserve cellular energy and available amino acids by quickly reducing the overall demand for charged tRNAs. Such a mechanism could potentially prioritize the translation of a subset of proteins that might have different codon usage biases or are translated via alternative, less valine-dependent mechanisms. This aligns with the concept of a multi-layered translational control where global initiation repression (as reflected in mTORC1 inhibition and polysome profiles) is complemented by specific elongation checkpoints, allowing for a more nuanced and adaptive response to severe nutrient stress.

      Reviewer #3 (Significance):

      Nature and significance of the advance

      The main contribution of this work is to demonstrate that depletion of multiple amino acids simultaneously impacts translation elongation in ways that are not necessarily additive. These impacts can depend on the distribution of codons in a transcript. It adds to a growing body of work showing that essential amino acid starvation can cause codon-specific ribosome stalling. The authors suggest that the position-dependent stalling they observe could be a novel regulatory mechanism to alleviate the effects of multi-amino acid starvation. However, it is not fully clear from the paper what the significance of a valine-based regulatory adaptation to BCAA starvation is, or whether simultaneous starvation of all three BCAAs is of particular physiological relevance. The paper's primary contribution is mainly focused on the similarity between valine and triple BCAA starvation, and it provides limited insight into the effects of combined depletion of two BCAAs.

      Context of existing literature

      Although ribosome profiling does not distinguish between actively-elongating and stalled ribosomes, sites with higher read coverage, and thereby higher inferred dwell time, can be used to infer ribosome stalling (Ingolia 2011). Various downstream effects of essential amino acid depletion have been documented, such as leucine deficiency being sensed by mTORC1 via leucyl-tRNA synthetase (Dittmar 2005, Han 2012), and shared transcriptional responses among many amino acid depletion conditions (Tang 2015). These authors have previously measured the translational effects of nutrient stress using ribosome profiling (e.g., Gobet 2020), as have others (Darnell 2018, Kochavi et al. 2024). The present work represents the first study (to our knowledge) combining BCAA depletions, representing an incremental and useful contribution to our understanding of translational responses to stress conditions.

      Audience

      This work is of interest to investigators studying the response of human cells in stress conditions, such as in human disease, as well as investigators studying the basic biology of eukaryotic translational control.

      Reviewer expertise: mRNA decay and translation regulation in bacteria.

      We hope the authors have found our comments thoughtful and useful. We welcome further discussion or clarification via email: Juliana Stanley (julianst@mit.edu) and Hannah LeBlanc (leblanch@mit.edu).

      We sincerely thank the reviewers for their thoughtful and constructive feedback, as well as for their careful and thorough reading of our manuscript. We also gratefully acknowledge the invitation for further discussion and would be happy to engage in future correspondence.

      References

      Allen, George E., Olesya O. Panasenko, Zoltan Villanyi, Marina Zagatti, Benjamin Weiss, Lucile Pagliazzo, Susanne Huch, et al. 2021. โ€œNot4 and Not5 Modulate Translation Elongation by Rps7A Ubiquitination, Rli1 Moonlighting, and Condensates That Exclude eIF5A.โ€ Cell Reports 36 (9): 109633. https://doi.org/10.1016/j.celrep.2021.109633.

      Darnell, Alicia M., Arvind R. Subramaniam, and Erin K. Oโ€™Shea. 2018. โ€œTranslational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells.โ€ Molecular Cell 71 (2): 229-243.e11. https://doi.org/10.1016/j.molcel.2018.06.041.

      Dittmar, Kimberly A., Michael A. Sรธrensen, Johan Elf, Mรฅns Ehrenberg, and Tao Pan. 2005. โ€œSelective Charging of tRNA Isoacceptors Induced by Amino-Acid Starvation.โ€ EMBO Reports 6 (2): 151โ€“57. https://doi.org/10.1038/sj.embor.7400341.

      Elf, Johan, Daniel Nilsson, Tanel Tenson, and Mans Ehrenberg. 2003. โ€œSelective Charging of tRNA Isoacceptors Explains Patterns of Codon Usage.โ€ Science (New York, N.Y.) 300 (5626): 1718โ€“22. https://doi.org/10.1126/science.1083811.

      Gobet, Cรฉdric, Benjamin Dieter Weger, Julien Marquis, Eva Martin, Nagammal Neelagandan, Frรฉdรฉric Gachon, and Felix Naef. 2020. โ€œRobust Landscapes of Ribosome Dwell Times and Aminoacyl-tRNAs in Response to Nutrient Stress in Liver.โ€ Proceedings of the National Academy of Sciences of the United States of America 117 (17): 9630โ€“41. https://doi.org/10.1073/pnas.1918145117.

      Hussmann, Jeffrey A., Stephanie Patchett, Arlen Johnson, Sara Sawyer, and William H. Press. 2015. โ€œUnderstanding Biases in Ribosome Profiling Experiments Reveals Signatures of Translation Dynamics in Yeast.โ€ Edited by Michael Snyder. PLOS Genetics 11 (12): e1005732. https://doi.org/10.1371/journal.pgen.1005732.

      Hwang, Jae-Yeon, and Allen R. Buskirk. 2017. โ€œA Ribosome Profiling Study of mRNA Cleavage by the Endonuclease RelE.โ€ Nucleic Acids Research 45 (1): 327โ€“36. https://doi.org/10.1093/nar/gkw944.

      Juszkiewicz, Szymon, and Ramanujan S. Hegde. 2017. โ€œInitiation of Quality Control during Poly(A) Translation Requires Site-Specific Ribosome Ubiquitination.โ€ Molecular Cell 65 (4): 743-750.e4. https://doi.org/10.1016/j.molcel.2016.11.039.

      Li, Fajin, Jianhuo Fang, Yifan Yu, Sijia Hao, Qin Zou, Qinglin Zeng, and Xuerui Yang. 2023. โ€œReanalysis of Ribosome Profiling Datasets Reveals a Function of Rocaglamide A in Perturbing the Dynamics of Translation Elongation via eIF4A.โ€ Nature Communications 14 (1): 553. https://doi.org/10.1038/s41467-023-36290-w.

      Mann, Gagandeep, Stephen Mora, Glory Madu, and Olasunkanmi A. J. Adegoke. 2021. โ€œBranched-Chain Amino Acids: Catabolism in Skeletal Muscle and Implications for Muscle and Whole-Body Metabolism.โ€ Frontiers in Physiology 12 (July):702826. https://doi.org/10.3389/fphys.2021.702826.

      Saikia, Mridusmita, Xiaoyun Wang, Yuanhui Mao, Ji Wan, Tao Pan, and Shu-Bing Qian. 2016. โ€œCodon Optimality Controls Differential mRNA Translation during Amino Acid Starvation.โ€ RNA (New York, N.Y.) 22 (11): 1719โ€“27. https://doi.org/10.1261/rna.058180.116.

      Sharma, Puneet, Jie Wu, Benedikt S. Nilges, and Sebastian A. Leidel. 2021. โ€œHumans and Other Commonly Used Model Organisms Are Resistant to Cycloheximide-Mediated Biases in Ribosome Profiling Experiments.โ€ Nature Communications 12 (1): 5094. https://doi.org/10.1038/s41467-021-25411-y.

      Sheen, Joon-Ho, Roberto Zoncu, Dohoon Kim, and David M. Sabatini. 2011. โ€œDefective Regulation of Autophagy upon Leucine Deprivation Reveals a Targetable Liability of Human Melanoma Cells In Vitro and In Vivo.โ€ Cancer Cell 19 (5): 613โ€“28. https://doi.org/10.1016/j.ccr.2011.03.012.

      Xiao, Fei, Chunxia Wang, Hongkun Yin, Junjie Yu, Shanghai Chen, Jing Fang, and Feifan Guo. 2016. โ€œLeucine Deprivation Inhibits Proliferation and Induces Apoptosis of Human Breast Cancer Cells via Fatty Acid Synthase.โ€ Oncotarget 7 (39): 63679โ€“89. https://doi.org/10.18632/oncotarget.11626.

      Yu, Deyang, Nicole E. Richardson, Cara L. Green, Alexandra B. Spicer, Michaela E. Murphy, Victoria Flores, Cholsoon Jang, et al. 2021. โ€œThe Adverse Metabolic Effects of Branched-Chain Amino Acids Are Mediated by Isoleucine and Valine.โ€ Cell Metabolism 33 (5): 905-922.e6. https://doi.org/10.1016/j.cmet.2021.03.025.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary and General Critique:

      This study examines the consequences of starvation for the BRCAAs, either singly, for Leu & Ile, or for all three simultaneously in HeLa cells on overall translation rates, decoding rates at each codon, and on ribosome density, protein expression, and distribution of ribosome stalling events across the CDS for each expressed gene. The single amino acid starvation regimes specifically reduce the cognate intracellular amino acid pool and lead to deacylation of at least a subset of the cognate tRNAs in a manner dependent on continuing protein synthesis. They also induce the ISR equally and decrease bulk protein synthesis equally in a manner that appears to occur largely at the initiation level for -Leu and -Val, judging by the decreased polysome:monsome ratio, but at both the initiation and elongation levels for -Ile-a distinction that remains unexplained. Only -Leu appears to down-regulate mTORC1 and TOP mRNA translation. There is a significant down-regulation of protein levels for 50-200 genes, which tend to be unstable in nutrient-replete cells, only a fraction of which are associated with reduced ribosome occupancies (RPFs measured by Ribo-Seq) on the corresponding mRNAs in the manner expected for reduced initiation, suggesting that delayed elongation is responsible for reduced protein levels for the remaining fraction of genes. All three single starvations lead to increased decoding times for a subset of the cognate "hungry" codons: CUU for -Leu, AUU and AUC for -Ile, and all of the Val codons, in a manner that is said to correspond largely to the particular tRNA isoacceptors that become deacylated, although this correspondence was not explained explicitly and might not be as simple as claimed. All three single starvations also evoke skewing of RPFs towards the 5' ends of many CDSs in a manner correlated with an enrichment within the early regions of the CDSs for one or more of the cognate codons that showed increased decoding times for -Ile (AUC codon) and -Val (GUU, GUC, and GUG), but not for -Leu-of which the latter was not accounted for. These last findings suggest that, at least for -Val and -Ile, delays in decoding N-terminal cognate codons cause elongating ribosomes to build-up early in the CDS. They go on to employ a peak calling algorithm to identify stalling sites in an unbiased way within the CDS, which are greatest in number for -Val, and find that Val codons are enriched in the A-sites (slightly) and adjacent 5' nucleotides (to a greater extent) for -Val starvation; and similarly for Ile codons in -Ile conditions, but not for -Leu starvation-again for unknown reasons. It's unclear why their called stalling sites have various other non-hungry codons present in the A sites with the cognate hungry codons being enriched further upstream, given that stalling should occur with the "hungry" cognate codon in the A site. The proteins showing down-regulation are enriched for stalling sites only in the case of the -Val starvation in the manner expected if stalling is contributing to reduced translation of the corresponding mRNA. It's unclear why this enrichment apparently does not extend to -Ile starvation which shows comparable skewing of RPFs towards the 5'ends, and this fact diminishes the claim that pausing generally contributes to reduced translation for genes with abundant hungry codons.<br /> All of the same analyses were carried out for the Double -Ile/-Leu and Triple starvations and yield unexpected results, particularly for the triple starvation wherein decoding times are increased only at Val codons, skewing of RPFs towards the 5' ends of CDSs is correlated only with an enrichment for Val codons within the early regions of the CDSs, and stall sites are enriched only for Val codons at nearly upstream sites, all consistent with the finding that only Val tRNAs become deacylated in the Triple regime. To explain why only Val tRNA charging is reduced despite the observed effective starvation for all three amino acids, they note first that stalling at Val codons is skewed towards the 5'ends of CDS for both -Val and triple starvations more so than observed for Ile or -Leu starvation, which they attribute to a greater frequency of Val codons vs Ile codons in the 5' ends of CDSs. As such, charged Val tRNAs are said to be consumed in translating the 5'ends of CDSs and the resulting stalling prevents ribosomes from reaching downstream Ile and Leu codons at the same frequencies and thus prevents deacylation of the cognate Ile and Leu tRNAs. It's unclear whether this explanation is adequate to explain the complete lack of Ile or Leu tRNA deacylation observed even when amino acid recycling by the proteasome is inhibited-a treatment shown to exacerbate deacylation of cognate tRNAs in the single amino acid starvations and of Val tRNA in the triple starvation. As such, the statement in the Abstract "Notably, we could show that isoleucine starvation-specific stalling largely diminished under triple starvation, likely due to early elongation bottlenecks at valine codons" might be too strong and the word "possibly" would be preferred over "likely". It's also unclear why the proteins that are down-regulated in the triple starvation are not significantly enriched for stalling sites (Fig. 5B) given that the degree of skewing is comparable or greater than for -Val. This last point seems to undermine their conclusion in the Abstract that "that many proteins downregulated under BCAA deprivation harbor stalling sites, suggesting that compromised elongation contributes to decreased protein output."<br /> In the case of the double -Ile/-Leu starvation, a related phenomenon occurs wherein decoding rates are decreased for only the AUU Ile codon and only the AAU Ile tRNA becomes deacylated; although in this case increased RPFs in the 5' ends are not correlated with enrichment for Ile or Leu codons and, although not presented, apparently stall sites are not associated with the Ile codon in the double starvation. In addition, stalling sites are not enriched in the proteins down-regulated by the double starvation. Moreover, because Ile codons are not enriched in the 5'ends of CDS, it doesn't seem possible to explain the selective deacylation of the single Ile tRNA observed in the double starvation by the same "bottleneck" mechanism proposed to explain selective deacylation of only Val tRNAs during the triple starvation. This is another reason for questioning their "bottleneck" mechanism.

      Specific comments (some of which were mentioned above):

      • The authors have treated cells with CHX in the Ribo-Seq experiments, which has been shown to cause artifacts in determining the locations of ribosome stalling in vivo owing to continued elongation in the presence of CHX (https://doi.org/10.1371/journal.pgen.1005732 ). The authors should comment on whether this artifact could be influencing some of their findings, particular the results in Fig. 5C where the hungry codons are often present upstream of the A sites of called stalling sites in the manner expected if elongation continued slowly following stalling in the presence of CHX.
      • p. 12: "These starvation-specific DT and ribosome density modulations were also evident at the individual transcript level, as exemplified by Col1a1, Col1a2, Aars, and Mki67 which showed persistent Val-codon-specific ribosome density increases but lost Ile-codon-specific increases under triple starvation (Supplementary Figure 3A-D). " This conclusion is hard to visualize for any but Val codons. It would help to annotate the relevant peaks of interest for -Ile starvation with arrows.
      • To better make the point that codon-specific stalling under BCAA starvation appears to be not driven by codon usage, rather than the analysis in Fig. 1H, wouldn't it be better to examine the correlation between increases in DT under the single amino acid starvation conditions and the codon frequencies across all codons?
      • p. 13, entire paragraph beginning with "Our RNA-seq and Ribo-seq revealed a general activation of stress response pathways across all starvations..." It is difficult to glean any important conclusions from this lengthy analysis, and the results do not appear to be connected to the overall topic of the study. If there are important conclusions here that relate to the major findings then these connections should be made or noted later in the Discussion. If not, perhaps the analysis should be largely relegated to the Supplemental material.
      • p. 15: "Together, these findings highlight that BCAA starvation triggers a combination of effects on initiation and elongation, with varying dynamics by amino acid starvation." I take issue with this statement as it appears that translation is reduced primarily at the initiation step for all conditions except -Ile. As noted above, these data are never menitioned in the DISCUSSION as to why only -Ile would show a marked elongation component to the inhibition whereas -Val gives the greatest amount of ribosome stalling.
      • I cannot decipher Fig. 4D and more detail is required to indicate the identify of each column of data.
      • In Fig. 4E, one cannot determine what the P values actually are, which should be provided in the legend to confirm statistical significance.
      • It's difficult to understand how the -Leu condition and the Double starvation can produce polarized RPFs (Fig. 4A) without evidence of stalling at the cognate hungry codons (Fig. 4E), despite showing later in Fig. 5A that the numbers of stall sites are comparable in those cases to that found for -Ile.
      • Fig. 5B: the P values should be given for all five columns, and it should be explained here or in the Discussion why the authors conclude that stalling is an important determinant for reduced translation when a significant correlation seems to exist only for the -Val condition and not even for the Triple condition.
      • p. 17: "Of note, in cases where valine or isoleucine codons were present just upstream (rather than at) the stalling position, we noted a strong bias for GAG (E), GAA (E), GAU (D), GAC (D), AAG (K), CAG (Q), GUG (V) and GGA (G) (Val starvation) and AAC (N), GAC (D), CUG (L), GAG (E), GCC (A), CAG (Q), GAA (E) and AAG (K) (Ile starvation) at the stalling site (Supplementary Figure 7B)." The authors fail to explain why these codons would be present in the A sites at stalling sites rather than the hungry codons themselves, especially since it is the decoding times of the hungry codons that are increased according to Fig. 1A-E. As suggested above, is this a CHX artifact?
      • Fig. 5D: P values for the significance, or lack thereof, of the different overlaps should be provided.
      • p. 17: "Nonetheless, when we examined entire transcripts rather than single positions, many transcripts that exhibited isoleucine-related stalling under Ile starvation also stalled under triple starvation, but at different sites along the CDS (Figure 5E). This finding is particularly intriguing, as it suggests that while Ile-starvation-specific stalling sites may shift under triple starvation, the overall tendency of these transcripts to stall remains." The authors never come back to account for this unexpected result.
      • It seems very difficult to reconcile the results in Fig. 5F with those in Fig. 4A, where similar polarities in RPFs are observed for -Ile and -Val in Fig, 4A but dramatically different distributions of stalling sites in Fig. 5F. More discussion of these discrepancies is required.
      • p. 18: " These isoacceptor-specific patterns correlate largely with the particular subsets of leucine and isoleucine codons that stalled (Figure 1A)." This correlation needs to be addressed for each codon-anticodon pair for all of the codons showing stalling in Fig. 1A.
      • p. 19: "For instance, in our double starvation condition, unchanged tRNA charging levels (Figure 6E) may result from a pronounced downregulation of global translation initiation, likely driven by the activation of stress responses (Figure 2), subsequently lowering the demand for charged tRNAs as it has been observed previously for Leu starvation 39. This seems at odds with the comparable down-regulation of protein synthesis for the Double starvation and -Leu and -Ile single starvations shown in Fig. 3C. Also, in the current study, Leu starvation does lower charging of certain Leu tRNAs.

      Significance

      The results here are significant in showing that starvation for a single amino acid does not lead to deacylation of all isoacceptors for that amino acid and in revealing that starvation for one amino acid can prevent deacylation of tRNAs for other amino acids, as shown most dramatically for the selective deacylation of only Val tRNAs in the triple BRCAA starvation condition. For the various reasons indicated above, however, I'm not convinced that their "bottleneck" mechanism is adequate to explain this phenomenon, especially in the case of the selective deacylation of Ile vs Leu tRNA in the Double starvation regime. It's also significant that deacylation leads to ribosome build-up near the 5'ends of CDS, which seems to be associated with an enrichment for the hungry codons in the case of Val and Ile starvation, but inexplicably, not for Leu or the Double starvations. This last discrepancy makes it hard to understand how the -Leu and Double starvations produce RPF buildups near the 5 ends of CDSs. In addition, the claim in the Discussion that "our data also highlight the importance of the codon positional context within mRNAs, indicating that where a codon is located within the CDS can influence both the extent of ribosomal stalling and overall translation efficiency during nutrient stress" overstates the strength of evidence that the stalling events lead to substantial decreases in translational efficiencies for the affected mRNAs, as the stalling frequency and decreased protein output are significantly correlated only for the -Val starvation, and the data in Fig. 3 D-H suggest that the reductions in protein synthesis generally occur at the level of initiation, even for -Val starvation, with a contribution from slow elongation only for -Ile-which is in itself difficult to understand considering that stalling frequencies are highest in -Val. Thus, while many of the results are very intriguing and will be of considerable interest to the translation field, it is my opinion that a number of results have been overinterpreted and that important inconsistencies and complexities have been overlooked in concluding that a significant component of the translational inhibition arises from the increased decoding times at hungry codons during elongation and that the selective deacylation of Val tRNAs in the Triple starvation can be explained by the "bottleneck" mechanism. The complexities and limitations of the data and their intepretations should be discussed much more thoroughly in the Discussion, which currently is devoted mostly to other phenomena often of tangential importance to the current findings. A suitably revised manuscript would clearly state the limitations and caveats of the proposed mechanisms and consider other possible explanations as well.

    1. Here's the thing I've learned about bodiesโ€”you can't look at one piece of it without seeing all the others, can't manipulate a part without having to negotiate every other aspect of that body too. You can try, but you can't do it. It just won't happen. It's not how bodies work.

      And cultures are no different, you can't look at one piece in isolation from all the rest because when you do you are very likely misreading the data!

    1. You decide to join a gym and consult with a personal trainer who uses specialized vocabulary to describe different types of exercise: aerobic, anaerobic, reps, plyometrics, and isometrics. You discover other gym members who share that same goal of becoming healthier, more flexible, and stronger. You become versed in a new language of fitness.

      It's important to be prepared for different life situations. You can't go to a job interview using colloquial language, which we normally use to talk to our friends and family, just as you can't go to a fast food restaurant using overly formal language and complicated words when all you want is to order a hamburger. We must always be aware and prepared about what words to use, depending on the place.

    2. Each of these academic fields had their own goals, their own genres, their own writing conventions, their own formats for citing sources, and their own expectations for writing style. I thought each of the teachers I encountered in my undergraduate career just had their own personal preferences that all felt pretty random to me.

      It's helpful to know that every prompt and style of writing expected from me has an underlying purpose. If something seems more challenging, it's to help push me further into the discourse community of my field.

    1. right now you are looking at pictures back from August of 2021. Fast forward one year to August of 2022 and the entire field is gone. Replaced by a giantย  construction site. Itโ€™s clear that Meta, the company who bought the land, has big plans. Andย  they mean business. Just four months later โ€“ now in December of 2022 โ€“ we can see that theย  construction of a datacenter is well underway. But then something strange happens. Anotherย  five months later, in April of 2023, our curious field in Temple looks like this:ย  all the previous construction gone, razed to the ground. Meta just deleted their entire datacenterย  halfway through its construction. An estimated 70 million dollars just gone.

      Facebook razes a data center site months after beginning construction

    1. one morning I had the most immense panic attack I've ever had and I just like saw red and just ran I legged it out of the retreat which is un it's unthinkable. you know, in a four-year retreat, you're not supposed to leave. But I jumped over the wall and tried to escape.

      for - adjacency - synchronicity - intense retreat experience - Mingyur Rinpoche - I'm listening to Mingyur Rinpoche and there's some synchronicity that in the live talk, he is talking about the same thing as the monk in this interview - They both went into a multiyear retreat and suffered huge panic attacks - to - Youtube - Mingyur Rinpoche - Anytime Anywhere meditation - South Africa - https://hyp.is/coluBIvcEfCRpD_roJ5NsQ/www.youtube.com/watch?v=P_GmQMZqtGU

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02922

      Corresponding author(s): Christian Specht

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      • *

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      • *

      We thank the reviewers for their thorough and constructive evaluation of our work. We have revised the manuscript carefully and addressed all the criticisms raised, in particular the issues mentioned by several of the reviewers (see point-by-point response below). We have also added a number of explanations in the text for the sake of clarity, while trying to keep the manuscript as concise as possible.

      • *

      In our view, the novelty of our research is two-fold. From a neurobiological point of view, we provide conclusive evidence for the existence of glycine receptors (GlyRs) at inhibitory synapses in various brain regions including the hippocampus, dentate gyrus and sub-regions of the striatum. This solves several open questions and has fundamental implications for our understanding of the organisation and function of inhibitory synapses in the telencephalon. Secondly, our study makes use of the unique sensitivity of single molecule localisation microscopy (SMLM) to identify low protein copy numbers. This is a new way to think about SMLM as it goes beyond a mere structural characterisation and towards a quantitative assessment of synaptic protein assemblies.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (ฮฑ1, ฮฑ2, and ฮฒ). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.

      The following are specific comments:

      1. Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRฮฒ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels. *Following the suggestion of reviewer 1, we re-analysed CA3 images of Glrbeos/eos hippocampal slices by applying a pixel-shift type of control, in which the Sylite channel (in far red) was horizontally flipped relative to the mEos4b-GlyRฮฒ channel (in green, see Methods). As expected, the number of mEos4b-GlyRฮฒ detections per gephyrin cluster was markedly reduced compared to the original analysis (revised__ Fig. 1B__), confirming that the synaptic mEos4b detections exceed chance levels (see page 5). *

      Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRฮฒ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRฮฒ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.

      *The pointillist images in Fig. 3A are essentially binary (red-black). Therefore, the density of detections at synapses cannot be easily judged by eye. For clarity, the original images in Fig. 3A have been replaced with two other examples that better reflect the different detection numbers in the dorsal and ventral striatum. *

      • *

      Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.

      *This is an important point that was also raised by the other reviewers. We have performed additional experiments to increase the data volume for analysis. For quantification, we used two approaches. First, we counted the percentage of infected cells in which synaptic localisation of the recombinant receptor subunit was observed (Fig. 5C). We found that mEos4b-GlyRa1 consistently localises at synapses, indicating that all cells express endogenous GlyRb. When neurons were infected with mEos4b-GlyRb, fewer cells had synaptic clusters, meaning that indeed, GlyR alpha subunits are the limiting factor for synaptic targeting. In cultures infected with mEos4b-GlyRa2, only very few neurons displayed synaptic localisation (as judged by epifluorescence imaging). We think this shows that GlyRa2 is less capable of forming heteromeric complexes than GlyRa1, in line with our previous interpretation (see pp. 9-10, 13). *

      • *

      Secondly, we quantified the total intensity of each subunit at gephyrin-positive domains, both in infected neurons as well as non-infected control cultures (Fig. 5D). We observed that mEos4b-GlyRa1 intensity at gephyrin puncta was higher than that of the other subunits, again pointing to efficient synaptic targeting of GlyRa1. Gephyrin cluster intensities (Sylite labelling) were not significantly different in GlyRb and GlyRa2 expressing neurons compared to the uninfected control, indicating that the lentiviral expression of recombinant subunits does not fundamentally alter the size of mixed inhibitory synapses in hippocampal neurons. Interestingly, gephyrin levels were slightly higher in hippocampal neurons expressing mEos4b-GlyRa1. In our view, this comes from an enhanced expression and synaptic targeting of mEos4b-GlyRa1 heteromers with endogenous GlyRb, pointing to a structural role of GlyRa1/b in hippocampal synapses (pp. 10, 13).

      • *

      The new data and analyses have been described and illustrated in the relevant sections of the manuscript.

      Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.

      All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case). We have systematically given these numbers in the revised manuscript (n, N, and other experimental parameters such as the number of animals used, coverslips, images or cells). Data are generally given as mean +/- SEM or as mean +/- SD as indicated.

      • *

      Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?

      The Glrbeos/eos knock-in mouse line has been characterised previously and does not to display any ultrastructural or functional deficits at inhibitory synapses (Maynard et al. 2021 eLife). GlyRฮฒ expression and glycine-evoked responses were not significantly different to those of the wild-type. The synaptic localisation of mEos4b-GlyRb in KI animals demonstrates correct assembly of heteromeric GlyRs and synaptic targeting. Accordingly, the animals do not display any obvious phenotype. We have clarified this in the manuscript (p. 4). In the case of cultured neurons, long-term expression of fluorescent receptor subunits with lentivirus has proven ideal to achieve efficient synaptic targeting. The low and continuous supply of recombinant receptors ensures assembly with endogenous subunits to form heteropentameric receptor complexes (e.g. [Patrizio et al. 2017 Sci Rep]). In the present study, lentivirus infection did not induce any obvious differences in the number or size of inhibitory synapses compared to control neurons, as judged by Sylite labelling of synaptic gephyrin puncta (new__ Fig. 5D__).

      Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.

      We agree that absolute quantification with SMLM is challenging, since the number of detections depends on fluorophore maturation, photophysics, imaging conditions, and analysis thresholds (discussed in Patrizio & Specht 2016, Neurophotonics). For this reason, only very few datasets provide reliable copy numbers, even for well-studied proteins such as PSD-95. One notable exception is the study by Maynard et al. (eLife 2021) that quantified endogenous GlyRb-containing receptors in spinal cord synapses using SMLM combined with correlative electron microscopy. The strength of this work was the use of a KI mouse strain, which ensures that mEos4b-GlyRb expression follows intrinsic regional and temporal profiles. The authors reported a stereotypic density of ~2,000 GlyRs/ยตmยฒ at synapses, corresponding to ~120 receptors per synapse in the dorsal horn and ~240 in the ventral horn, taking into account various parameters including receptor stoichiometry and the functionality of the fluorophore. These values are very close to our own calculations of GlyR numbers at spinal cord synapses that were obtained slightly differently in terms of sample preparation, microscope setup, imaging conditions, and data analysis, lending support to our experimental approach. Nevertheless, the obtained GlyR copy numbers at hippocampal synapses clearly have to be taken as estimates rather than precise figures, because the number of detections from a single mEos4b fluorophore can vary substantially, meaning that the fluorophores are not represented equally in pointillist images. This can affect the copy number calculation for a specific synapse, in particular when the numbers are low (e.g. in hippocampus), however, it should not alter the average number of detections (Fig. 1B) or the (median) molecule numbers of the entire population of synapses (Fig. 1C). We have discussed the limitations of our approach (p. 11).

      Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?

      *As discussed above (point 6), the detection of fluorophores with SMLM is influenced by many parameters, not least the noise produced by emitting molecules other than the fluorophore used for labelling. Our study is exceptional in that it attempts to identify extremely low molecule numbers (down to 1). To verify that the detections obtained with PALM correspond to mEos4b, we conducted robust control experiments (including pixel-shift as suggested by the reviewer, see point 1, revised__ Fig. 1B__). The rationale for the nanobody-based dSTORM experiments was twofold: (1) to have an independent readout of the presence of low-copy GlyRs at inhibitory synapses and (2) to analyse the nanoscale organisation of GlyRs relative to the synaptic gephyrin scaffold using dual-colour dSTORM with spectral demixing (see p. 6). The organic fluorophores used in dSTORM (AF647, CF680) ensure high photon counts, essential for reliable co-localisation and distance analysis. PALM and dSTORM cannot be combined in dual-colour mode, as they require different buffers and imaging conditions. *

      The specificity of the anti-Eos nanobody was demonstrated by immunohistochemistry in spinal cord cultures expressing mEos4b-GlyRb and wildtype control tissue (Fig. S3). In response to the reviewer's remarks, we also performed a negative control experiment in Glrbeos/eos slices (dSTORM), in which the nanobody was omitted (new__ Fig. S4F,G__). Under these conditions, spectral demixing produced a single peak corresponding to CF680 (gephyrin) without any AF647 contribution (Fig. S4F). The background detection of "false" AF647 detections at synapses was significantly lower than in the slices labelled with the nanobody. We conclude that the fluorescence signal observed in our dual-colour dSTORM experiments arises from the specific detection of mEos4b-GlyRb by the nanobody, rather than from background, cross-reactivity or wrong attribution of colour during spectral demixing. We have added these data and explanations in the results (p. 7) and in the figure legend of Fig. S4F,G.

      What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.

      This is an interesting question in the context of our experiments with low-copy GlyRs, since the spatial resolution of SMLM is limited also by the density of molecules, i.e. the sampling of the structure in question (Nyquist-Shannon criterion). Accordingly, the priority of the PALM experiments was to improve the sensibility of SMLM for the identification of mEos4b-GlyRb subunits, rather than to maximize the spatial resolution. The mean localisation precision in PALM was 33 +/- 12 nm, as calculated from the fitting parameters of each detection (Zeiss, ZEN software), which ultimately result from their signal-to-noise ratio. This is a relatively low precision for SMLM, which can be explained by the low brightness of mEos4b compared to organic fluorophores together with the elevated fluorescence background in tissue slices.

      • *

      In the case of dSTORM, the aim was to study the relative distribution of GlyRs within the synaptic scaffold, for which a higher localisation precision was required (p. 6). Therefore, detections with a precision โ‰ฅ 25 nm were filtered during analysis with NEO software (Abbelight). The retained detections had a mean localisation precision of 12 +/- 5 for CF680 (Sylite) and 11 +/- 4 for AF647 (nanobody). These values are given in the revised manuscript (pp. 18, 22).

      Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (

      Multiple detections of the same fluorophore are intrinsic to dSTORM imaging and have not been eliminated from the analysis. Small clusters of detections likely represent individual molecules (e.g. single receptors in the extrasynaptic regions, Fig. 2A). DBSCAN is a robust clustering method that is quite insensitive to minor changes in the choice of parameters. For dSTORM of synaptic gephyrin clusters (CF680), a relatively low length (80 nm radius) together with a high number of detections (โ‰ฅ 50 neighbours) were chosen to reconstruct the postsynaptic domain with high spatial resolution (see point 8). In the case of the GlyR (nanobody-AF647), the clustering was done mostly for practical reasons, as it provided the coordinates of the centre of mass of the detections. The low stringency of this clustering (200 nm radius, โ‰ฅ 5 neighbours) effectively filters single detections that can result from background noise or incorrect demixing. An additional reference explaining the use of DBSCAN including the choice of parameters is given on p. 22 (see also R2 point 4).

      For microscopy experiment methods, state power densities, not % or "nominal power".

      *Done. We now report the irradiance (laser power density) instead of nominal power (pp. 18, 21). *

      In general, not much data presented. Any SI file with extra images etc.?

      *The original submission included four supplementary figures with additional data and representative images that should have been available to the reviewer (Figs. S1-S4). The SI file has been updated during revision (new Fig. S4E-G). *

      Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRฮฒ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR ฮฑ subunits." Does this mean that GlyRฮฒ proteins that fail to form complexes with GlyRฮฑ subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both ฮฑ1 and ฮฒ transcripts." suggests a dependency on ฮฑ1 and ฮฒ transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.

      We thank the reviewer to point this out. We are dealing with several processes; protein expression that determines subunit availability and the assembly of pentameric GlyRs complexes, surface expression, membrane diffusion and accumulation of GlyRb-containing receptor complexes at inhibitory synapses. We have edited the manuscript, particularly the discussion and tried to be as clear as possible in our wording.

      • *

      We chose not to add a schematic illustration for the time being, because any graphical representation is necessarily a simplification. Instead, we preferred to summarise the main numbers in tabular form (Table 1). We are of course open to any other suggestions.

      Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.

      The dSTORM images in Fig. 2 are pointillist representations that show individual detections rather than molecules. Small clusters of detections are likely to originate from a single AF647 fluorophore (in the case of nanobody labelling) and therefore represent single GlyRb subunits. Since GlyR copy numbers are so low at hippocampal synapses (โ‰ค 5), the notion of nanodomain is not directly applicable. Our analysis therefore focused on the integration of GlyRs within the postsynaptic scaffold, rather than attempting to define nanodomain structures (see also response to point 8 of R1). A clarification has been added in the revised manuscript (p. 6).

      __Reviewer #1 (Significance (Required)): __

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking.

      We thank the reviewer for acknowledging both the technical and biological advances of our study. While we recognize that our work builds upon established models, we consider that it also addresses important unresolved questions, namely that GlyRs are present and specifically anchored at inhibitory synapses in telencephalic regions, such as the hippocampus and striatum. From a methodological point of view, our study demonstrates that SMLM can be applied not only for structural analysis of highly abundant proteins, but also to reliably detect proteins present at very low copy numbers. This ability to identify and quantify sparse molecule populations adds a new dimension to SMLM applications, which we believe increases the overall impact of our study beyond the field of synaptic neuroscience.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses.

      However I have some minor points that the authors may address for clarification:

      1) In Figure 1 the authors apply PALM imaging of mEos4b-GlyRรŸ (knockin) and here the corresponding Sylite label seems to be recorded in widefield, it is not clearly stated in the figure legend if it is widefield or super-resolved. In Fig 1 A - is the scale bar 5 ยตm? Some Sylite spots appear to be sized around 1 ยตm, especially the brighter spots, but maybe this is due to the lower resolution of widefield imaging? Regarding the statistical comparison: what method was chosen to test for normality distribution, I think this point is missing in the methods section.

      *This is correct; the apparent size of the Sylite spots does not reflect the real size of the synaptic gephyrin domain due to the limited resolution of widefield imaging including the detection of out-of-focus light. We have clarified in the legend of Fig. 1A that Sylite labelling was with classic epifluorescence microscopy. The scale bar in Fig. 1A corresponds to 5 ยตm. Since the data were not normally distributed, nonparametric tests (Kruskal- Wallis one-way ANOVA with Dunnโ€™s multiple comparison test or Mann-Whitney U-test for pairwise comparisons) were used (p. 23). *

      Moreover I would appreciate a clarification and/or citation that the knockin model results in no structural and physiological changes at inhibitory synapses, I believe this model has been applied in previous studies and corresponding clarification can be provided.

      The Glrbeos/eos mouse model has been described previously and does not exhibit any structural or physiological phenotypes (Maynard et al. 2021 eLife). The issue was also raised by reviewer R1 (point 5) and has been clarified in the revised manuscript (p. 4).

      2) In the next set of experiments the authors switch to demixing dSTORM experiments - an explanation why this is performed is missing in the text - I guess better resolution to perform more detailed distance measurements? For these experiments: which region of the hippocampus did the authors select, I cannot find this information in legend or main text.

      Yes, the dSTORM experiments enable dual-colour structural analysis at high spatial resolution (see response to R1 point 7). An explanation has been added (p. 6).

      3) Regarding parameters of demixing experiments: the number of frames (10.000) seems quite low and the exposure time higher than expected for Alexa 647. Can the authors explain the reason for chosing these particular parameters (low expression profile of the target - so better separation?, less fluorophores on label and shorter collection time?) or is there a reference that can be cited? The laser power is given in the methods in percentage of maximal output power, but for better comparison and reproducibility I recommend to provide the values of a power meter (kW/cm2) as lasers may change their maximum output power during their lifetime.

      Acquisition parameters (laser power, exposure time) for dSTORM were chosen to obtain a good localisation precision (~12 nm; see R1 point 8). The number of frames is adequate to obtain well sampled gephyrin scaffolds in the CF680 channel. In the case of the GlyR (nanobody-AF647), the concept of spatial resolution does not really apply due to the low number of targets (see R1, point 13). Power density (irradiance) values have now been given (pp. 18, 21).

      4) For analysis of subsynaptic distribution: how did the authors decide to choose the parameters in the NEO software for DBSCAN clustering - was a series of parameters tested to find optimal conditions and did the analysis start with an initial test if data is indeed clustered (K-ripley) or is there a reference in literature that can be provided?

      DBSCAN parameters were optimised manually, by testing different values. Identification of dense and well-delimited gephyrin clusters (CF680) was achieved with a small radius and a high number of detections (80 nm, โ‰ฅ 50 neighbours), whereas filtering of low-density background in the AF647 channel (GlyRs) required less stringent parameters (200 nm, โ‰ฅ 5) due to the low number of target molecules. Similar parameters were used in a previous publication (Khayenko et al. 2022, Angewandte Chemie). The reference has been provided on p. 22 (see also R1 point 9).

      5) A conclusion/discussion of the results presented in Figure 5 is missing in the text/discussion.

      *This part of the manuscript has been completely overhauled. It includes new experimental data, quantification of the data (new Fig.5), as well as the discussion and interpretation of our findings (see also R1, point 3). In agreement with our earlier interpretation, the data confirm that low availability of GlyRa1 subunits limits the expression and synaptic targeting of GlyRa1/b heteropentamers. The observation that GlyRa1 overexpression with lentivirus increases the size of the postsynaptic gephyrin domain further points to a structural role, whereby GlyRs can enhance the stability (and size) of inhibitory synapses in hippocampal neurons, even at low copy numbers (pp. 13-14). *

      6) in line 552 "suspension" is misleading, better use "solution"

      Done.

      __Reviewer #2 (Significance (Required)): __

      Significance: The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRรŸ and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript.

      Field of expertise: Super-Resolution Imaging and corresponding analysis

      __Reviewer #4 (Evidence, reproducibility and clarity (Required)): __

      In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRฮฒ to detect endogenous glycine receptors using single-molecule localization microscopy. The main conclusion from this study is that in the hippocampus GlyRฮฒ molecules are barely detected, while inhibitory synapses in the ventral striatum seem to express functionally relevant GlyR numbers.

      I have a few points that I hope help to improve the strength of this study.

      • In the hippocampus, this study finds that the numbers of detections are very low. The authors perform adequate controls to indicate that these localizations are above noise level. Nevertheless, it remains questionable that these reflect proper GlyRs. The suggestion that in hippocampal synapses the low numbers of GlyRฮฒ molecules "are important in assembly or maintenance of inhibitory synaptic structures in the brain" is on itself interesting, but is not at all supported. It is also difficult to envision how such low numbers could support the structure of a synapse. A functional experiment showing that knockdown of GlyRs affects inhibitory synapse structure in hippocampal neurons would be a minimal test of this.

      *It is not clear what the reviewer means by โ€œit remains questionable that these reflect proper GlyRsโ€. The PALM experiments include a series of stringent controls (see R1, point 1) demonstrating the existence of low-copy GlyRs at inhibitory synapses in the hippocampus (Fig. 1) and in the striatum (Fig. 3), and are backed up by dSTORM experiments (Fig. 2). We have no reason to doubt that these receptors are fully functional (as demonstrated for the ventral striatum (Fig. 4). However, due to their low number, a role in inhibitory synaptic transmission is clearly limited, at least in the hippocampus and dorsal striatum. *

      • *

      We therefore propose a structural role, where the GlyRs could be required to stabilise the postsynaptic gephyrin domain in hippocampal neurons. This is based on the idea that the GlyR-gephyrin affinity is much higher than that of the GABAAR-gephyrin interaction (reviewed in Kasaragod & Schindelin 2018 Front Mol Neurosci). Accordingly, there is a close relationship between GlyRs and gephyrin numbers, sub-synaptic distribution, and dynamics in spinal cord synapses that are mostly glycinergic (Specht et al. 2013 Neuron; Maynard et al. 2021 eLife; Chapdelaine et al. 2021 Biophys J). It is reasonable to assume that low-copy GlyRs could play a similar structural role at hippocampal synapses. A knockdown experiment targeting these few receptors is technically very challenging and beyond the scope of this study. However, in response to the reviewer's question we have conducted new experiments in cultured hippocampal neurons (new__ Fig. 5__). They demonstrate that overexpression of GlyRa1/b heteropentamers increases the size of the postsynaptic domain in these neurons, supporting our interpretation of a structural role of low-copy GlyRs (p. 14).

      • The endogenous tagging strategy is a very strong aspect of this study and provides confidence in the labeling of GlyRฮฒ molecules. One caveat however, is that this labeling strategy does not discriminate whether GlyRฮฒ molecules are on the cell membrane or in internal compartments. Can the authors provide an estimate of the ratio of surface to internal GlyRฮฒ molecules?

      Gephyrin is known to form a two-dimensional scaffold below the synaptic membrane to which inhibitory GlyRs and GABAARs attach (reviewed in Alvarez 2017 Brain Res). The majority of the synaptic receptors are therefore thought to be located in the synaptic membrane, which is supported by the close relationship between the sub-synaptic distribution of GlyRs and gephyrin in spinal cord neurons (e.g. Maynard et al. 2021 eLife). To demonstrate the surface expression of GlyRs at hippocampal synapses we labelled cultured hippocampal neurons expressing mEos4b-GlyRa1 with anti-Eos nanobody in non-permeabilised neurons (see Figure below for the reviewer only). The close correspondence between the nanobody (AF647) and the mEos4b signal confirms that the majority of the GlyRs are indeed located in the synaptic membrane.

      • *

      Figure (for the reviewer only).* Left: Lentivirus expression of mEos4b-GlyRa1 in fixed and non-permeabilised hippocampal neurons (mEos4b signal). Right: Surface labelling of the recombinant subunit with anti-Eos nanoboby (AF647). *

      • 'We also estimated the absolute number of GlyRs per synapse in the hippocampus. The number of mEos4b detections was converted into copy numbers by dividing the detections at synapses by the average number of detections of individual mEos4b-GlyRฮฒ containing receptor complexes'. In essence this is a correct method to estimate copy numbers, and the authors discuss some of the pitfalls associated with this approach (i.e., maturation of fluorophore and detection limit). Nevertheless, the authors did not subtract the number of background localizations determined in the two negative control groups. This is critical, particularly at these low-number estimations.

      We fully agree that background subtraction can be useful with low detection numbers. In the revised manuscript, copy numbers are now reported as background-corrected values. Specifically, the mean number of detections measured in wildtype slices was used to calculate an equivalent receptor number, which was then subtracted from the copy number estimates across hippocampus, spinal cord and striatum. This procedure is described in the methods (p. 20) and results (p. 5, 8), and mentioned in the figure legends of Fig. 1C, 3C. The background corrected values are given in the text and Table 1.

      Furthermore, the authors state that "The advantage of this estimation is that it is independent of the stoichiometry of heteropentameric GlyRs". However, if the stoichometry is unknown, the number of counted GlyRฮฒ subunits cannot simply be reported as the number of GlyRs. This should be discussed in more detail, and more carefully reported throughout the manuscript.

      *The reviewer is right to point this out. There is still some debate about the stoichiometry of heteropentameric GlyRs. Configurations with 2a:3b, 3a:2b and 4a:1b subunits have been advanced (e.g. Grudzinska et al. 2005 Neuron; Durisic et al. 2012 J Neurosci; Patrizio et al. 2017 Sci Rep; Zhu & Gouaux 2021 Nature). We have therefore chosen a quantification that is independent of the underlying stoichiometry. Since our quantification is based on very sparse clusters of mEos4b detections that likely originate from a single receptor complex (irrespective of its stoichiometry), the reported values actually reflect the number of GlyRs (and not GlyRb subunits). We have clarified this in the results (p. 5) and throughout the manuscript (Table 1). *

      • The dual-color imaging provides insights in the subsynaptic distribution of GlyRฮฒ molecules in hippocampal synapses. Why are similar studies not performed on synapses in the ventral striatum where functionally relevant numbers of GlyRฮฒ molecules are found? Here insights in the subsynaptic receptor distribution would be of much more interest as it can be tight to the function.

      This is an interesting suggestion. However, the primary aim of our study was to identify the existence of GlyRs in hippocampal regions. At low copy numbers, the concept of sub-synaptic domains (SSDs, e.g. Yang et al. 2021 EMBO Rep) becomes irrelevant (see R1 point 13). It should be pointed out that the dSTORM pointillist images (Fig. 2A) represent individual GlyR detections rather than clusters of molecules. In the striatum, our specific purpose was to solve an open question about the presence of GlyRs in different subregions (putamen, nucleus accumbens).

      • It is unclear how the experiments in Figure 5 add to this study. These results are valid, but do not seem to directly test the hypothesis that "the expression of ฮฑ subunits may be limiting factor controlling the number of synaptic GlyRs". These experiments simply test if overexpressed ฮฑ subunits can be detected. If the ฮฑ subunits are limiting, measuring the effect of ฮฑ subunit overexpression on GlyRฮฒ surface expression would be a more direct test.

      Both R1 and R2 have also commented on the data in Fig. 5 and their interpretation. We have substantially revised this section as described before (see R1 point 3) including additional experiments and quantification of the data (new Fig. 5). The findings lend support to our earlier hypothesis that GlyR alpha subunits (in particular GlyRa1) are the limiting factor for the expression of heteropentameric GlyRa/b in hippocampal neurons (pp. 13-14). Since the GlyRa1 subunit itself does not bind to gephyrin (Patrizio et al. 2017 Sci Rep), the synaptic localisation of the recombinant mEos4b-GlyRa1 subunits is proof that they have formed heteropentamers with endogenous GlyRb subunits and driven their membrane trafficking, which the GlyRb subunits are incapable of doing on their own.

      __Reviewer #4 (Significance (Required)): __

      These results are based on carefully performed single-molecule localization experiments, and are well-presented and described. The knockin mouse with endogenously tagged GlyRฮฒ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with single-molecule localization microscopy is very strong as it provides high sensitivity and spatial resolution.

      The conceptual innovation however seems relatively modest, these results confirm previous studies but do not seem to add novel insights. This study is entirely descriptive and does not bring new mechanistic insights.

      This study could be of interest to a specialized audience interested in glycine receptor biology, inhibitory synapse biology and super-resolution microscopy.

      my expertise is in super-resolution microscopy, synaptic transmission and plasticity

      As we have stated before, the novelty of our study lies in the use of SMLM for the identification of very small numbers of molecules, which requires careful control experiments. This is something that has not been done before and that can be of interest to a wider readership, as it opens up SMLM for ultrasensitive detection of rare molecular events. Using this approach, we solve two open scientific questions: (1) the demonstration that low-copy GlyRs are present at inhibitory synapses in the hippocampus, (2) the sub-region specific expression and functional role of GlyRs in the ventral versus dorsal striatum.

      • *

      • *

      The following review was provided later under the name โ€œReviewer #4โ€. To avoid confusion with the last reviewer from above we will refer to this review as R4-2.


      __Reviewer #4-2 (Evidence, reproducibility and clarity (Required)): __


      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood.

      Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.

      They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.

      Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB.

      The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.

      Major comments:

      • Are the key conclusions convincing?

      The authors are generally precise in the formulation of their conclusions.

      • They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions.
      • The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies.
      • Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum.
      • The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics A thorough analysis and quantification of the data in Fig.5 has been carried out as requested by all the other reviewers (e.g. R1, point 3). The new data and results have been described in the revised manuscript (pp. 9-10, 13-14).

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunitโ€™s localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.

      *This question has also been raised before (R1, point 5). According to an earlier study the mEos4b-GlyRb knock-in does not cause any obvious phenotypes, with the possible exception of minor loss of glycine potency (Maynard et al. 2021 eLife). The fact that the synaptic levels in the spinal cord in heterozygous animals are precisely half of those of homozygous animals argues against differences in receptor expression, heteropentameric assembly, forward trafficking to the plasma membrane and integration into the synaptic membrane as confirmed using quantitative super-resolution CLEM (Maynard et al. 2021 eLife). Accordingly, we did not observe any behavioural deficits in these animals, making it a powerful experimental model. We have added this information in the revised manuscript (p. 4). *

      In addition, without any quantification or statistical analysis, the authorโ€™s claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).

      As mentioned before, we have substantially revised this part (R1, point 3). The quantification and analysis in the new Fig. 5 support our earlier interpretation.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The authors show that there is colocalization of gephyrin with the mEos4b-GlyRฮฒ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these non-synaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic.

      The existence of extrasynaptic GlyRs is well attested in spinal cord neurons (e.g. Specht et al. 2013 Neuron; this study see Fig. S2). The fact that these appear as small clusters of detections in SMLM recordings results from the fact that a single fluorophore can be detected several times in consecutive image frames and because of blinking. Therefore, small clusters of detections likely represent single GlyRs (that can be counted), and not assemblies of several receptor complexes. Due to their diffusion in the neuronal membrane, they are seen as diffuse signals throughout the somatodendritic compartment in epifluorescence images (e.g. Fig. 5A). SMLM recordings of the same cells resolves this diffuse signal into discrete nanoclusters representing individual receptors (Fig. 5B). It is not clear what information co-localisation experiments with specific markers could provide, especially in hippocampal neurons, in which the copy numbers (and density) of GlyRs is next to zero.

      In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.

      Quantification of the data have been carried out (new Fig.5C,D). The results have been described before (R1, point 3) and support our earlier interpretation of the data (pp. 13-14).

      Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Instituteโ€™s ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good.

      We have added the requested information from the ISH database of the Allen Institute in the discussion as suggested by the reviewer (p. 12). However, in combination with the transcriptomic data (Fig. S1) our finding strongly suggest that the expression of synaptic GlyRs depends on the availability of alpha subunits rather than on the presence of the GlyRb transcript. This is obvious when one compares the mRNA levels in the hippocampus with those in the basal ganglia (striatum) and medulla. While the transcript concentrations of GlyRb are elevated in all three regions and essentially the same, our data show that the GlyRb copy numbers *at synapses differ over more than 2 orders of magnitude (Fig. 1B, Table 1). *

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.

      • Are the data and the methods presented in such a way that they can be reproduced?

      Yes, for the most part.

      • Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      • Specific experimental issues that are easily addressable.

      N/A

      • Are prior studies referenced appropriately?

      Yes

      • Are the text and figures clear and accurate?

      Yes, although quantification in figure 5 is currently not present.

      A quantification has been added (see R1, point 3).

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results.

      We agree in principle with this suggestion. However, the revised Fig. S4 is more complex and we think that it would distract from the data shown in Fig. 2. Given that Fig. S4 is mostly methodological and not essential to understand the text, we have kept it in the supplement for the time being. We leave the final decision on this point to the editor.

      __Reviewer #4-2 (Significance (Required)): __

      [This review was supplied later]

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function.

      • State what audience might be interested in and influenced by the reported findings.

      Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims.

      *We thank the reviewer for the positive assessment of the technical and biological implications of our work, as well as the interest of our findings to a wide readership of neuroscientists and cell biologists. *

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.

      The following are specific comments:

      1. Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels.
      2. Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.
      3. Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.
      4. Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.
      5. Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?
      6. Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.
      7. Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?
      8. What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.
      9. Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one.
      10. For microscopy experiment methods, state power densities, not % or "nominal power".
      11. In general, not much data presented. Any SI file with extra images etc.?
      12. Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.
      13. Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.

      Significance

      The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking.

    1. "It's likeany other therapy, but using music, and just by nature of it beingmusic versus some sort of science doesn't make it any lessimportant."

      This is ethos because sheโ€™s defending her major with logic and her own credibility. That music therapy is just as legit as any other therapy, even if itโ€™s not โ€œscience.โ€

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1)ย The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2)ย The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the directionof-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3)ย The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4)ย The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 58-64):

      (2)ย The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      โ€ขย Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      โ€ขย Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      โ€ขย Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 84-85, 119-120). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3)ย The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      โ€ขย Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      โ€ขย Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the โ€œCall Levelโ€ parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion. We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections โ€“ see lines 346-349, 372-375.

      Reviewer #2 (Public review):

      We are grateful for the reviewerโ€™s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a twodimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.ย 

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me.ย 

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight.ย 

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinklรธv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats(Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 543-545.ย 

      If so, what is the difference between phi_target and phi_tx in the model equations?ย 

      ๐“<sub>๐’•๐’‚๐’“๐’ˆ๐’†๐’•</sub> represents the angle between the bat and the reflected object (target).

      ๐“<sub>๐‘ป๐’™</sub> the angle [rad], between the masking bat and target (from the transmitterโ€™s perspective)

      ๐“<sub>๐‘ป๐’™๐‘น๐’™</sub> refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitterโ€™s point of view.

      ๐“<sub>๐‘น๐’™๐‘ป๐’™</sub> represents the angle between the receiving bat and the transmitting bat, from the receiverโ€™s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 525-530). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      What is a bat's response to colliding with a conspecific (rather than a wall)?ย 

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldshtein et al., 2025). Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics. See lines 479-484.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both?ย 

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 110-111):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials.ย 

      We clarified in the revised text (Lines 627-628 in Statistical Analysis)ย 

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewerโ€™s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the batsโ€™ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the s below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation?ย 

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on welldocumented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 499-508).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect?ย 

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase most of the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma), we also have empirical recordings of individuals flying under similar conditions (Goldshtein et al., 2025). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities. See lines 500-508.

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillantโ€™s method using a filterbank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003).

      We have now explicitly highlighted this in the revised version (see 548-581).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation.ย 

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming.ย 

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB. ย 

      The reviewer is correct. Indeed, integration over multiple calls improves signal-tonoise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem?ย 

      As described in the Methods section, the batโ€™s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the batโ€™s surroundings.

      See lines 600-616 in the revised version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach.ย 

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      โ€ขย Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m (Fujioka et al., 2021), as observed in Myotis grisescens (Sabol and Hudson, 1995) and Tadarida brasiliensis (Theriault et al., no date; Betke et al., 2008; Gillam et al., 2010)

      โ€ขย Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewerโ€™s concern, we have clarified this justification in the revised text and will provide supporting references where applicable (see Methods lines 450455).

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem.ย 

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler, Bioscience and 2001, no date; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarrรญa, 2022)). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 1: The impact of confusion on performance, and lines 399-404 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines 411-420 in the manuscript for further discussion.ย 

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (see Lines 509-512 in Methods).

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"ย  :

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to support stable and realistic flight trajectories while maintaining a reasonable collision rate. These values reflect a trade-off between maneuverability and behavioral coherence under crowding. To address this point, we added a sensitivity analysis to the revised manuscript. Specifically, we tested the effect of varying the conspecific avoidance distance from 0.2 to 1.6 meters at bat densities of 2 to 40 bats/3mยฒ. The only statistically significant impact was at the highest density (40 bats/3mยฒ), where exit probability increased slightly from 82% to 88% (p = 0.024, t = 2.25, DF = 958). No significant changes were observed in exit time, collision rate, or jamming probability across other densities or conditions (GLM, see revised Methods). These results suggest that the selected avoidance distances are robust and not a major driver of model performance, see lines 469-47.

      The 15-second exit limit was determined as described in the text (Lines 489-491): โ€œA 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.โ€ In other words, it allowed each bat to circle the โ€˜caveโ€™ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewerโ€” measuring the time taken for a certain percentage of bats to exitโ€”is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much โ€“ they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration? ย 

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions?

      Does it include masking, no masking, or which species?ย 

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss and Surlykke, 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking. We have revised the text to clarify these details see, lines 489-491.

      Reviewer #1 (Recommendations for the authors):

      (1) Data Availability:

      As it stands now, this reviewer cannot vouch for the uploaded code as it wasn't accessible according to F.A.I.R principles. The link to the code/data points to a private company's file-hosting account that requires logging in or account creation to see its contents, and thus cannot be accessed.

      This reviewer urges the authors to consider uploading the code onto an academic data repository from the many on offer (e.g. Dryad, Zenodo, OSF). Some repositories offer an option to share a private link (e.g. Zenodo) to the folder that can then be shared only with reviewers so it is not completely public.

      This is a computational paper, and the credibility of the results is based on the code used to generate them.

      The code is available at GitHub as required:

      https://github.com/omermazar/Colony-Exit-Bat-Simulation

      (2) Abstract:

      Line 22: 'To explore whether..' - replace 'whether' with 'how'?

      The sentence was rephrased as suggested by the reviewer.

      (2) Main text:

      Line 43: '...which may share...' - correct to '...which share...', as elegantly framed in the authors' previous work - jamming avoidance is unavoidable because all FM bats of a species still share >90% of spectral bandwidth despite a few kHz shift here and there.

      The sentence was rephrased as suggested by the reviewer.

      Line 49: The authors may wish to additionally cite the work of Fawcett et al. 2015 (J. Comp. Phys A & Biology Open)

      Thank you for the suggestion. We have included a citation to the work of Fawcett et al. (2015) in the revised manuscript.

      Line 61: This statement does not match the recent state of the literature. While the previous models may have assumed that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from the potential inability to track all neighbours, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Jhawar et al. 2020 Nature Physics.

      We have added citations to the important studies suggested by the reviewer, as detailed in the Public Review above.

      Line 89: '..took all interference signals into account...' - what is meant by 'interference signals' - are the authors referring to reflections, unclear.

      We have revised the sentence and detailed the acoustic signals involved in the process: self-generated echoes, calls from conspecifics, and echoes from cave walls and other bats evoked by those calls, see lines 99-106.

      Figure 1A: The colour scheme with overlapping points makes the figure very hard to understand what is happening. The legend has colours from subfigures B-D, adding to the confusion.

      What does the yellow colour represent? This is not clear. Also, in general, the color schemes in the simulation trajectories and the legend are not the same, creating some amount of confusion for the reader. It would be good to make the colour schemes consistent and visually separable (e.g. consp. call direct is very similar to consp. echo from consp. call), and perhaps also if possible add a higher resolution simulation visualisation. Maybe it is best to separate out the colour legends for each sub-figure.

      The updated figure now includes clearer, more visually separable colors, and consistent color coding across all sub-panels. The yellow trajectory representing the focal batโ€™s flight path is now explicitly labeled, and we adjusted the color mapping of acoustic signals (e.g., conspecific calls vs. echoes) to improve distinction. We also revised the figure caption accordingly and ensured that the legend is aligned with the updated visuals. These modifications aim to enhance interpretability and reduce ambiguity for the reader.

      Figure C3: What is 'FB Channel', this is not explained in the legend.

      FB Channelโ€™ stands for โ€˜Filter Bank Channelโ€™. This clarification has been added to the caption of Figure 1.ย 

      Figure 3: Visually noticing that the colour legend is placed only on sub-figure A is tricky and readers may be left searching for the colour legend. Maybe lay out the legend horizontally on top of the entire figure, so it stands out?

      We have adjusted the placement of the color legend in Figure 3 to improve visibility and consistency.

      Line 141: '..the probability of exiting..' - how is this probability calculated - not clear.

      We have clarified in the revised text that the probability of exiting the cave within 15 seconds is defined as the number of bats that exited the cave within that time divided by the total number of bats in each scenario, see lines 159160.

      Line 142: What are the sample sizes here - i.e. how many simulation replicates were performed?

      We have clarified the number of repetitions in each scenario the revised text, as detailed in the Public Review above.

      Line 151: 'The jamming probability,...number of jammed echoes divided by the total number of reflected echoes' - it seems like these are referring to 'own' echoes or first-order reflections, it is important to clarify this.

      The reviewer is right. We have clarified it in the revised text, see lines 173175.

      Line 153: '..with a maximum difference of ...' - how is this difference calculated? What two quantities are being compared - not clear.

      We have revised the text to clarify that the 14.3% value reflects the maximum difference in jamming probability between the RM and PK models, which occurred at a density of 10 bats. The values at each density are shown in Figure 2D, see lines 175-177.

      Line 221: '..temporal aggregation helps..' - I'm assuming the authors meant temporal integration? However, I would caution against using the exact term 'temporal integration' as it is used in the field of audition to mean something different. Perhaps something like 'sensory integration' , or 'multi-call integration'

      To avoid ambiguity and better reflect the process modeled in our work, we have replaced the term "temporal aggregation" with "multi-call integration" throughout the revised manuscript. This term more accurately conveys the idea of combining information from multiple echolocation calls without conflicting with existing terminology.

      (4) Discussion

      Lines 302: 'Our model suggests...increasing the call-rate..' - not clear where this is explicitly tested or referred to in this manuscript. Can't see what was done to measure/quantify the effect of this variable in the Methods or anywhere else.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 346-349.

      Line 319: 'spatial interference' - unclear what this means. This reviewer would strongly caution against creating new terms unless there is an absolute need for it. What is meant by 'interference' in this paper is hard to assess given that the word seems to be used as a synonym for jamming and also for actual physical wave-based interference.

      We have rephrased this paragraph as detailed in the Public Review above, see line 119-120, 366-367.

      Line 323: '..no benefit beyond a certain level...' - also not clear where this is explicitly tested. It seems like there was a set of simulations run for a variety of parameters but this is not written anywhere explicitly. What type of parameter search was done, was it all possible parameter combinations - or only a subset? This is not clear.

      We have rephrased this paragraph as detailed in the Public Review above, see lines 372-375.

      Line 324: '..ca. 110 dB-SPL.' - what reference distance?

      All call levels were simulated and reported in dB-SPL, referenced at 0.1 meters from the emitting bat. We have clarified it in the revised text in the relevant contexts and specifically in line 529.

      (5) Methods

      Line 389 : '...over a 2 x 1.5 m2 area..' It took a while to understand this statement and put it in context. Since there is no previous description of the entire L-arena, the reviewer took it to mean the simulations happened over the space of a 2 x 1.5 m2 area. Include a top-down description of the simulation's spatial setup and rephrase this sentence.

      To address the confusion, we revised the text to clarify that the full simulation environment represents a corridor-shaped cave measuring 14.5 ร— 2.5 meters, with a right-angle turn located 5.5 meters before the exit, as shown in Figure 1A. The 2 ร— 1.5 m area refers specifically to the small zone at the far end of the cave where bats begin their flight. The revised description now includes a clearer spatial overview to prevent ambiguity, see lines 456-460.

      Line 398: Replace 'High proximity' with 'Close proximity'

      Replaced.

      Line 427: 'uniform target strength of -23 dB' - at what distance is this target strength defined? Given the reference distance can vary by echolocation convention (0.1 or 1 m), one can't assess if this is a reasonable value or not.

      The reference distance for the reported target strength is 1 meter, in line with standard acoustic conventions. We have revised the text to clarify this explicitly (line 531).

      Also, independent of the reference distance, particularly with reference to bats, the target strength is geometry-dependent, based on whether the wings are open or not. Using the entire wingspan of a bat to parametrise the target strength is an overestimate of the available reflective area. The effective reflective area is likely to be somewhere closer to the surface area of the body and a fraction of the wingspan together. This is important to note and/or mention explicitly since the value is not experimentally parametrised.

      For comparison, experimentally based measurements used in Goetze et al. 2016 are -40 dB (presumably at 1 m since the source level is also defined at 1 m?), and Beleyur & Goerlitz 2019 show a range between -43 to -34 dB at 1 m.

      We agree with the reviewer that target strength in bats is strongly influenced by their geometry, particularly wing posture during flight. In our model, we simplified this aspect by using a constant target strength, as the detailed temporal variation in body and wing geometry is pseudo-random and not explicitly modeled. We acknowledge that this is a simplification, and have now stated this limitation clearly in the revised manuscript. We chose a fixed value of โ€“23 dB at 1 meter to reflect a plausible mid-range estimate, informed by anatomical data and consistent with values reported for similarly sized species (Beleyur and Goerlitz, 2019). To support this, we directly measured the target strength of a 3D-printed RM bat model, obtaining โ€“32dB.ย 

      Moreover, a sensitivity analysis across a wide range (โ€“49 to โ€“23 dB) confirmed that performance metrics remain largely stable, indicating that our conclusions are not sensitive to this parameter, and suggesting that our results hold for different-sized bats. See lines 384-390, 533-538, and Supplementary Figures 3 and 4 in the revised article.ย 

      Line 434: 'To model the bat's cochlea...'. Bats have two cochleas. This model only describes one, while the agents are also endowed with the ability to detect sound direction - which requires two ears/cochleas.... There is missing information about the steps in between that needs to be provided.

      We appreciate the reviewerโ€™s observation. Indeed, our model is monaural, and simulates detection using a single cochlear-like filter bank receiver. We have clarified this in the revised text to avoid confusion. This paragraph specifically describes the detection stage of the auditory processing pipeline. The localization process, which builds on detection and includes directional estimation, is described in the following paragraph (see line 583 onward), as discussed in the next comment and response.

      Line 457: 'After detection, the bat estimates the range and Direction of Arrival...' This paragraph describes the overall idea, but not the implementation. What were the inputs and outputs for the range and DOA calculation performed by the agent? Or was this information 'fed' in by the simulation framework? If there was no explicit DOA step that the agent performed, but it was assumed that agents can detect DOA, then this needs to be stated.

      In the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. Instead, based on experimental studies (Simmons et al., 1983; Popper and Fay, 1995).ย  we assumed that bats can estimate the direction of an echo with an angular error that depends on the signal-to-noise ratio (SNR). Accordingly, the inputs to the DOA estimation were the peak level of the desired echo, noise level, and the level of acoustic interference. The output was an estimated direction of arrival that included a random angular error, drawn from a normal distribution whose standard deviation varied with the SNR. We have revised the relevant paragraph (Lines 583-592) to clarify this implementation.

      Line 464: 'To evaluate the impact of the assumption...' - the 'self' and 'non-self' echoes can be distinguished perhaps using pragmatic time-delay cues, but also using spectro-temporal differences in individual calls/echoes. Do the agents have individual call structures, or do all the agents have the same call 'shape'? The echolocation parameters for the two modelled species are given, but whether there is call parameter variation implemented in the agents is not mentioned.

      In our relatively simple model, all individuals emit the same type of chirp call, with parameters adapted only based on the distance to the nearest detected object. However, individual variation is introduced by assigning each bat a terminal frequency drawn from a normal distribution with a standard deviation of 1 kHz, as described in the revised version -lines 519-520. This small variation is not used explicitly as a spectro-temporal cue for echo discrimination.

      In our model, all spectro-temporal variationsโ€”whether due to call structure or variations resulting from overlapping echoes from nearby reflectorsโ€”are processed through the filter bank, which compares the received echoes to the transmitted call during the detection stage. As such, the detection process itself can act as a discriminative filter, to some extent, based on similarity to the emitted call.

      We acknowledge that real bats likely rely on a variety of spectro-temporal features for distinguishing self from non-self-echoesโ€”such as call duration, received level, multi-harmonic structure, or amplitude modulation. In our simulation, we focus on comparing two limiting conditions: full recognition of self-generated echoes versus full confusion. Implementing a more nuanced self-recognition mechanism based on temporal or spectral cues would be a valuable extension for future work.

      (6) References

      Reference 22: Formatting error - and extra '4' in the reference.

      The error has been fixed.

      (7) Thoughts/comments

      Even without 'recogntion' of walls & conspecifics, bats may be able to avoid obstacles - this is a neat result. Also, using their framework the authors show that successful 'blind' object-agnostic obstacle avoidance can occur only when supported by some sort of memory. In some sense, this is a nice intermediate step showing the role of memory in bat navigation. We know that bats have good long-term and long-spatial scale memory, and here the authors show that short-term spatial memory is important in situations where immediate sensory information is unreliable or unavailable.

      We appreciate the reviewerโ€™s thoughtful summary. Indeed, one of the main takeaways of our study is that successful obstacle avoidance can occur even without explicit recognition of walls or conspecificsโ€”provided that a clustered multi-call integration is in place. Our model shows that when immediate sensory information is unreliable, integrating detections over time becomes essential for effective navigation. This supports the broader view that memory, even on short timescales, plays an important role in bat behavior.

      (8) Reporting GLM results

      The p-value, t-statistic, and degrees of freedom are reported consistently across multiple GLM results. However, the most important part which is the effect size is not consistently reported - and this needs to be included in all results, and even in the table. The effect size provides an indicator of the parameter's magnitude, and thus scientific context.

      We agree that the effect size provides essential scientific context. In fact, we already include the effect size explicitly in Table 1, as shown in the โ€œEffect Sizeโ€ column for each tested parameter. These values describe the magnitude of each parameterโ€™s effect on exit probability, jamming probability, and collision rate. In the main text, effect sizes are presented as concrete changes in performance metrics (e.g., โ€œexit probability increased from 20% to 87%,โ€ or โ€œwith a decrease of 3.5%ยฑ8% to 5.5%ยฑ5% (mean ยฑ s.e.)โ€), which we believe improves interpretability and scientific relevance. ย 

      To further clarify this in the main text, we have reviewed the reported results and ensured that effect sizes are mentioned more consistently wherever GLM outcomes are discussed. Additionally, we have added a brief note in the table caption to emphasize that effect sizes are provided for all tested parameters.

      The 'tStat' appears multiple times and seems to be the output of the MATLAB GLM function. This acronym is specific to the MATLAB implementation and needs to be replaced with a conventionally used acronym such as 't', or the full form 't-statistic' too. This step is to keep the results independent of the programming language used.

      We have replaced all instances of tStat with the more conventional term โ€˜tโ€™ throughout the manuscript to maintain consistency with standard reporting practices.

      Reviewer #2 (Recommendations for the authors):

      In addition to my public review, I had a few minor points that the authors may want to consider when revising their paper.

      (1)ย Figures 2, 3, and 4 may benefit from using different marker styles, in addition to different colors, to show the different cases.

      Thank you for the suggestion. In Figures 2โ€“4, the markers represent means with standard error bars. To maintain clarity and consistency across all conditions, we have chosen to keep a standardized marker style โ€“ and we clarify this in the legend. We found that varying only the colors is sufficient for distinguishing between conditions without introducing visual clutter.

      (2)ย The text "PK" in the inset for Figure 2A is very difficult to read. I would suggest using grey as with "RM" in the other inset.

      We have updated the insert in Figure 2A to improve legibility.

      (3)ย Are the error bars in Figure 3 very small? I wasn't able to see them. If that is the case, the authors may want to mention this in the caption.

      You are correctโ€”the error bars are present in all plots but appear very small due to the large number of simulation repetitions and low variability. We have revised the caption to explicitly mention this.

      (4)ย The species name of PK is spelled inconsistently (kuhli, khulli, and kuhlii).

      We have corrected the species name throughout the manuscript.

      (5)ย Table 1 is a great condensation of all the results, but the time to exit is missing. It may be helpful if summary statistics on that were here as well.

      We have added time-to-exit to the effect size column in Table 1, alongside the other performance metrics, to provide a more complete summary of the simulation results.

      (6)ย I may have missed it, but why are there two values for the exit probability when nominal flight speed is varied?

      The exit probability was not monotonic with flight speed, but rather showed a parabolic trend with a clear optimum. Therefore, we reported two values representing the effect before and after the peak. We have clarified this in the revised table and updated the caption accordingly.

      (7) Table 2 has an extra header after the page break on page 18.

      The extra header in Table 2 after the page break has been removed in the revised manuscript.

      (8)ย The G functions have 2 arguments in their definitions and Equation 1, but only one argument in Equations 2 and 3. I wasn't able to see why.

      Thank you for pointing this out. You are correctโ€”this was a typographical error. We have corrected the argument notation in Equations 2 and 3 and explicitly included the frequency dependence of the gain (G) functions in both equations.

      (9)ย D_txrx was not defined but it was used in Equation 2.

      The variable D_txrx is defined in the equation notation section as: D<sub>โ‚โ‚œโ‚“</sub>r<sub>โ‚“</sub> โ€“ the distance [m] between the transmitting conspecific and the receiving focal bat, from the transmitterโ€™s perspective. We have now ensured that this definition is clearly linked to Equation 2 in the revised text. Moreover, we have added a supplementary figure that illustrates the geometric configuration defined by the equations to further support clarity, as described in the Public Review above.

      (10) It was hard for me to understand what was meant by phi_rx and phi_tx. These were described as angles between the rx or tx bats and the target, but I couldn't tell what the point defining the angle was. Perhaps a diagram would help, or more precise definitions.

      We have revised the caption to provide clearer and more precise definitions Additionally, we have included a geometric diagram as a supplementary figure, as noted in the Public Review above, to visually clarify the spatial relationships and angle definitions used in the equations, see lines 498-499.

      (11) Was the hearing threshold the same for both species?

      Yes. We have clarified it in the revised version.

      (12) Collision avoidance is described as turning to the "opposite direction" in the supplemental figure explaining the model. Is this 90 degrees or 180 degrees? If 90 degrees, how do these turns decide between right and left?

      In our model, the bat does not perform a fixed 90ยฐ or 180ยฐ turn. Instead, the avoidance behavior is implemented by setting the maximum angular velocity in the direction opposite to the detected echo. For example, if the obstacle or conspecific is detected on the batโ€™s right side, the bat begins turning left, and vice versa.

      This turning direction is re-evaluated at each decision step, which occurs after every echolocation pulse. The bat continues turning in the same direction if the obstacle remains in front, otherwise it resumes regular pathfinding. We have clarified this behavior in the updated figure caption and model description, see lines 478-493.

      Reviewer #3 (Recommendations for the authors):

      (1)ย Lines 27-31: These sentences mischaracterize the results. This claim appears to equate "the model works" with "this is what bats actually do." Also, the model does not indicate that bats' echolocation strategies are robust enough to mitigate the effects of jamming - this is self-evident from the fact that bats navigate successfully via echolocation in dense groups.

      Thank you for the comment. Our aim was not to claim that the model confirms actual bat behavior, but rather to demonstrate that simple and biologically plausible strategiesโ€”such as signal redundancy and basic pathfindingโ€”are sufficient to explain how bats might cope with acoustic interference in dense settings. We have revised the wording to better reflect this goal and to avoid overinterpreting the model's implications.

      See abstract in the revised version. ย 

      (2)ย Line 37: This number underestimates the number of bats that form some of the largest aggregations of individuals worldwide - the free-tailed bats can form aggregations exceeding several million bats.

      We have revised the text to reflect that some bat species, such as free-tailed bats, are known to form colonies of several million individuals, which exceed the typical range. The updated sentence accounts for these extreme cases, see lines 36-37.

      (3)ย The flight densities explained in the introduction and chosen references are not representative of the literature - without providing additional justification for the chosen species, it can be interpreted that the selection of the species for the simulation is somewhat arbitrary. If the goal is to model dense emergence flight, why not use a species that has been studied in terms of acoustic and flight behavior during dense emergence flights---such as Tadarida brasiliensis?

      Our goal was to develop a general model applicable to a broad class of FMecholocating bat species. The two species we selectedโ€”Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM)โ€”span a wide range of signal characteristics: from wideband (PK) to narrowband (RM), providing a representative contrast in call structure.ย 

      Although we did not include Tadarida brasiliensis (TB) specifically, its echolocation calls are acoustically similar to RM in terminal frequency and fall between PK and RM in bandwidth. Therefore, we believe our findings are likely to generalize to TB and other FM-bats.

      Moreover, as noted in a previous response, the average inter-bat distance in our highest-density simulations (0.27โ€ฏm) is still smaller than those reported for Tadarida brasiliensis during dense emergencesโ€”further supporting the relevance of our model to such scenarios.

      To support broader applicability, we also provide a supplementary graphical user interface (GUI) that allows users to modify key echolocation parameters and explore their impact on behaviorโ€”making the framework adaptable to additional species, including TB.

      (4)ย Line 78: It is not clear how (or even if) the simulated bats estimate the direction of obstacles. The explanation given in lines 457-463 is quite confusing. What is the acoustic/neurological mechanism that enables this direction estimation? If there is some mechanism (such as binaural processing), how does this extrapolate to 3D?

      This comment echoes a similar concern raised by a previous reviewer. As explained earlier, in the current simulation, the Direction of Arrival (DOA) was not modeled via an explicit binaural processing mechanism. The completeย  is detailed inย  to Reviewer #1, Line 457. This implementation is now clarified in the revised text, and a detailed description of the localization process is also provided in the Methods section (lines 583-592).

      (5)ย The authors propose they are modeling the dynamic echolocation of bats in the simulation (line 79), but it appears (whether this is due to a lack of information in the manuscript or true lack in the simulation) that the authors only modeled a flight response. How did the authors account for bats dynamically changing their echolocation? This is unclear and from what I can tell may just mean that the bats can switch between foraging phase call types depending on the distance to a detected obstacle. Can the authors elaborate more on this?

      The echolocation behavior of the batsโ€”including dynamic call adjustmentsโ€” was implemented in the simulation and is described in detail in the Methods section (lines 498-520 and Table 2). To avoid redundancy, the Results chapter originally referred to this section, but we have now added a brief explanation in the Results to clarify that the batsโ€™ call parameters (IPI, duration, and frequency range) adapt based on the distance to detected objects, following empirically documented echolocation phases ("search," "approach," "buzz"). These dynamics are consistent with established bat behavior during navigation in cluttered environments such as caves.

      (6)ย Figure 1 C3: "Detection threshold": what is this and how was it derived?

      The caption also mentions yellow arrows, but they are absent from the figure. C4: Each threshold excursion is marked with an asterisk, but there are many more excursions than asterisks. Why are only some marked? Unclear.

      C3: The detection threshold is determined dynamically. It is set to the greater of either 7 dB above the noise level (0 dB-SPL)(Kick, 1982; Saillant et al., 1993; Sanderson et al., 2003; Boonman et al., 2013) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB. This clarification has been added to the Methods section. The yellow arrow has been added.

      C4: Thank you for this important observation. Only peaks marked with asterisks represent successful detectionsโ€”those that were identified in both the interference-free and full detection conditions, as explained in the Methods. Other visible peaks result from masking signals or overlapping echoes from nearby reflectors, but they do not meet the detection criteria. To keep the figure caption concise, we have elaborated on this process more clearly in the revised Methods section. We added this information to the legend

      (7)ย Figure 2: A line indicating RM, No Masking is absent

      Thank you for pointing this out. The missing line for RM, No Masking has now been added in the revised version of Figure 2.

      (8)ย Line 121: "reflected off conspecifics". Does this mean echoes due to conspecifics?

      The phrase "reflected off conspecifics" refers to echoes originating from the batโ€™s own call and reflected off the bodies of nearby conspecifics. We have clarified the wording in the revised text to avoid confusion

      (9)ย Line 125: Why are low-frequency channels stimulated by higher frequencies? This needs further clarification.

      The cochlear filter bank in our model is implemented using gammatone filters, each modeled as an 8th-order Butterworth filter. Due to the non-ideal filter response and relatively broad bandwidthsโ€”especially in the lower-frequency channelsโ€”strong energy from the beginning of the downward FM chirp (at higher frequencies) can still produce residual activation in lower-frequency channels. While these stimulations are usually below the detection threshold, they may still be visible as early sub-threshold responses. Given the technical nature of this explanation (a property of the filter implementation) and it does not influence the detection outcomes, we have chosen not to elaborate on it in the figure caption or Methods.

      (10)ย Lines 146-150: This is an interesting finding. Is there a theoretical justification for it?

      This outcome arises directly from the simulation results. As noted in the Discussion (lines 359-365), although Pipistrellus kuhlii (PK) shows a modest advantage in jamming resistance due to its broader bandwidth, the redundancy in sensory information across callsโ€”enabled by frequent echolocationโ€”appears to compensate for these signal differences. As a result, the small variations in echo quality between species do not translate into significant differences in performance. We speculate that if the difference in jamming probability had been larger, performance disparities would likely have emerged.

      (11)ย Line 151: The authors define a jammed echo as an echo entirely missed due to masking. Is this appropriate? Doesn't echo mis-assignment also constitute jamming?

      We agree that echo mis-assignment can also degrade performance; however, in our model, we distinguish between two outcomes: (1) complete masking (echo not detected), and (2) detection with a localization error. As explained in the Methods (lines 500โ€“507), we run the detection analysis twiceโ€”once with only desired echoes (โ€œinterference-free detectionโ€) and once including masking signals (โ€œfull detectionโ€). If a previously detected echo is no longer detected, it is classified as a jammed echo. If the echo is still detected but the delay shifts by more than 100 ยตs compared to the interference-free condition, it is also considered jammed. If the delay shift is smaller, it is treated as a detection with localization error rather than full jamming. We have clarified this distinction in the revised Methods section.

      (12)ย Figure 2-E: Detection probability statistics are of limited usefulness without accompanying false alarm rate (FAR) statistics. Do the authors have FAR numbers?

      We understand FAR to refer to instances where masking signals or other acoustic phenomena are mistakenly interpreted as real echoes from physical objects. As explained in the manuscript, we implemented two model versions: one without confusion, and one with full confusion.

      Figure 2E reports detection performance under the non-confusion model, in which only echoes from actual physical reflectors are used, and no false detections occurโ€”hence, the false alarm rate is effectively zero in this condition. In the full-confusion model, all detected echoesโ€”including those originating from masking signals or conspecific callsโ€”are treated as valid detections, which may include false alarms. However, we did not explicitly quantify the false alarm rate as a separate metric in this simulation.

      We agree that tracking FAR could be informative and will consider incorporating it into future versions of the model.

      (13)ย Line 161: RM bats suffered from a significantly higher probability of the "desired conspecific's echoes" being jammed. What does "desired conspecific's echoes" mean? This is unclear.

      The term โ€œdesired conspecific's echoesโ€ refers to echoes originating from the batโ€™s own call, reflected off nearby conspecifics, which are treated as relevant reflectors for collision avoidance. We have revised the wording in the text for clarity.

      (14)ย Line 188: Why didn't the size of the integration window affect jamming probability? I couldn't find this explained in the discussion.

      The jamming probability in our analysis is computed at the individual-echo level, prior to any temporal integration. Since the integration window is applied after the detection step, it does not influence whether a specific echo is masked (i.e., jammed) or not. Therefore, as expected, we did not observe a significant effect of integration window size on jamming probability.

      (15) Line 217-218: Why do the authors think this would be?

      Thank you for the thoughtful question. We agree that, in theory, increasing call intensity should raise the levels of both desired echoes and masking signals proportionally. However, in our model, the environmental noise floor and detection threshold remain constant, meaning that higher call intensities increase the signal-to-noise ratio (SNR) more effectively for weaker echoes, especially those at longer distances or with low reflectivity. This could lead to a higher likelihood of those echoes crossing the detection threshold, resulting in a small but measurable reduction in jamming probability.

      Additionally, the non-linear behavior of the filter-bank receiverโ€”including such as thresholding at multiple stagesโ€”can introduce asymmetries in how increased signal levels affect the detection of target versus masking signals.

      That said, the effect size was small, and the improvement in jamming probability did not translate into any significant gain in behavioral performance (e.g., exit probability or collision rate), as shown in Figure 3C.

      (16)ย Line 233: I'm not sure I understand how a slightly improved aggregation model that clustered detected reflectors over one-second periods is different. Doesn't this just lead to on average more calls integrated into memory?

      While increasing the memory duration does lead to more detections being available, the enhanced aggregation model (we now refer to as multi-call clustering) differs fundamentally from the simpler one. As detailed in the Methods, it includes additional processing steps: clustering spatially close detections, removing outliers, and estimating wall directions based on the spatial structure of clustered echoes. In contrast, the simpler model treats each detection as an isolated point without estimating obstacle orientation. These additional steps allow for more robust environmental interpretation and significantly improve performance under high-confusion conditions. We have clarified it in revised text (lines 606-616) and added a Supplementary Figure 2B.

      (17) Table 1: What about conspecific target strength?

      We have now added the conspecific target strength as a tested parameter in Table 1, along with its tested range, default value, and measured effect sizes. A detailed sensitivity analysis is also presented in Supplementary Figure 4, demonstrating that variations in conspecific target strength had relatively minor effects on performance metrics. ย 

      (18) Figure 3-A: The x-axis is the number of calls in the integration window. But the leftmost sample on each curve is at 0 calls. Shouldn't this be 1?

      โ€œ0 callsโ€ refers to the case where only the most recent call is used for pathfindingโ€”without integrating any information from prior calls. The x-axis reflects the number of previous calls stored in memory, so a value of 0 still includes the current call. Weโ€™ve clarified this terminology in the figure caption.

      (19)ย Lines 282-283: This statement needs to be clarified that it is with the constraints of using a 2D simulation with at most 33 bats/m^2. It also should be clarified that it is assumed the bat can reliably distinguish between its own echoes and conspecific echoes, which is a very important caveat.

      We have revised the text to clarify that the results are based on a 2D simulation with a maximum tested density of 33 bats/mยฒ. We also now explicitly state that the model assumes bats can distinguish between their own echoes and those generated by conspecificsโ€”an assumption we recognize as a simplification. These clarifications help place the results within the scope and constraints of the simulation. Moreover, as described in the text (and noted in previous response): the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m

      (20)ย Line 294: What is this sentence referring to?

      The sentence refers to the finding that, even under high bat densities, a substantial portion of the echoesโ€”particularly those reflected from nearby obstacles (e.g., 1โ€ฏm away)โ€”were jammed due to masking. Nevertheless, the bats in the simulation were still able to navigate successfully using partial sensory input. We have clarified the sentence in the revised text to make this point more explicit, see line 333-336.

      (21)ย Line 302: Was jamming less likely when IPI was higher or lower? I could not find this demonstrated anywhere in the manuscript.

      We agree that the original text was not sufficiently clear on this point. While we did not explicitly test fixed IPI values as a parameter, the model does simulate the natural behavior of decreasing IPI as bats approach obstacles. This behavior is supported by empirical observations and is incorporated into the echolocation dynamics of the simulation. We have clarified this point in the revised text (see Lines 346-351) and explained that while lower IPI introduces more acoustic overlap, it also increases redundancy and improves detection through temporal integration.

      (22)ย Lines 313-314: This is an interesting assumption, but it is not evident that is substantiated by the references.

      The claim is based on well-established principles in signal processing and bioacoustics. Wideband signalsโ€”such as those emitted by PK batsโ€” distribute their energy over a broader frequency range, which makes them inherently more resistant to narrowband interference and masking. This concept is commonly applied in both biological and artificial sonar systems and is supported by empirical studies in bats and theory in acoustic sensing.

      For example, Beleyur & Goerlitz (2019) demonstrate that broader bandwidth calls improve detection in cluttered and jamming-prone environments. Similarly, Ulanovsky et al. (2004) and Schnitzler & Kalko (200) discuss how FM bats' wideband calls enhance temporal and spatial resolution, helping to reduce the impact of overlapping signals from conspecifics. These findings align with communication theory where spread-spectrum techniques improve robustness in noisy environments.

      We agree with the reviewer that this is an important point and we have updated the manuscript to clarify this rationale and cite the relevant literature accordingly โ€“ lines 631-363,

      (23) Lines 318-319: What is the justification for "probably"? Isn't this just a supposition?

      We agree with the reviewerโ€™s point and have rephrased the sentence

      (24) Line 320: How does this 63% performance match the sentence in line 295?

      The sentence in Line 295 refers to the overall ability of the bats to navigate successfully despite high jamming levels, highlighting the robustness of the strategy under challenging conditions. The figure in Line 320 (63%) quantifies this performance under the most extreme simulated scenario (100 bats / 3 mยฒ), where both spatial and acoustic interferences are maximal. We have rephrased the text in the revised version (lines 324-327).

      (25)ย Lines 341-345: It seems like this is more likely to be the main takeaway of the paper.

      As noted in the Public Review above, there is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from those of conspecifics (e.g., Schnitzler, Bioscience, 2001; Kazial et al., 2001, 2008; Burnett & Masters, 2002; Chiu et al., 2009; Yovel et al., 2009; Beetz & Hechavarrรญa, 2022). Therefore, we consider our assumption of selfrecognition to be well-supported, at least under typical conditions. That said, we agree that the impact of echo confusion on performance is significant and highlights a critical challenge in dense environments.

      To our knowledge, this is the first computational model to explicitly simulate both self-recognition and full echo confusion under high-density conditions. We believe that the combination of modeled constraints and the demonstrated robustness of simple sensorimotor strategies, even under worst-case assumptions, is what makes this contribution both novel and meaningful.

      (26)ย Lines 349-350: What is the aggregation model? What is meant by "integration"?

      We have revised the text to clarify that the โ€œaggregation modelโ€ refers to a multi-call clustering process that includes clustering of detections, removal of outliers, and estimation of wall orientation, as described in detail in the revised Methods and Results sections.

      (27) Line 354: Again, why isn't this the assumption we're working under?

      As addressed in our response to Comment 25, our primary model assumes that bats can recognize their own echoesโ€”an assumption supported by substantial empirical evidence. The alternative "full confusion" model was included to explore a worst-case scenario and highlight the behavioral consequences of failing to distinguish self from conspecific echoes. We assume that real bats may experience some degree of echo misidentification; however, our assumption of full confusion represents a worst-case scenario.

      (28) Line 382: "Under the assumption that..." I agree that bats probably can, but if we assume they can differentiate them all, where's the jamming problem?

      The assumption that bats can theoretically distinguish between different signal sources applies after successful detection. However, the jamming problem arises during the detection and localization stages, where acoustic interference can prevent echoes from crossing the detection threshold or distort their timing.

      (29)ย Lines 386-387: The paper referenced focused on JAR in the context of foraging. What changes were made to the simulation to switch to obstacle avoidance?

      While the simulation framework in Mazar & Yovel (2020) was developed to study jamming avoidance during foraging, the core componentsโ€”such as the acoustic calculations, receiver model, and echolocation behaviorโ€”remain applicable. For the current study, we adapted the simulation extensively to address colony-exit behavior. These modifications include modeling cave walls as acoustic reflectors, implementing a pathfinding algorithm, integrating obstacle-avoidance maneuvers, and adapting the integration window and integration processes. These updates are detailed throughout the Methods section.

      (30)ย Line 400-402: Something doesn't add up with the statement: each decision relies on an integration window that records estimated locations of detected reflectors from the last five echolocation calls, with the parameter being tested between 1 and 10 calls. Can the authors reword this to make it less confusing?

      We have reworded the sentence to clarify that the default integration window includes five calls, while we systematically tested the effect of using 1 to 10 calls, see lines 486-487.

      (31)ย Line 393: "30 deg/sec" why was this value chosen?

      The turning rate of 30 deg/sec was manually selected to approximate the curvature of natural foraging flight paths observed in Rhinopoma microphyllum using on-board tags. Moreover, in Mazar & Yovel (2020), we showed that the flight dynamics of simulated bats in a closed room closely matched those of Pipistrellus kuhlii flying in a room of similar dimensions. However, in the current simulation, bats rarely follow a random-walk trajectory due to the structured environment and frequent obstacle detection. As a result, this parameter has no meaningful impact on the simulation outcomes.

      (32)ย Line 412: "Harmony" --- do you mean harmonic? And what is the empirical evidence that RM bats use the 2nd harmonic compared to the 1st?

      Perhaps showing a spectrogram of a real RM signal would be helpful.

      The typo-error was corrected. For reference See (Goldshtein et al., 2025)

      (33)ย Table 2: Something is incorrect with the table. The first row on the next page is the wrong species name. Also, where are the citations for these parameter values?

      The table header has been corrected in the revised version. The parameter values for flight and echolocation behavior were derived from existing literature and empirical data: Pipistrellus kuhlii parameters were based on Kalko (1995), and Rhinopoma microphyllum parameters were extracted from our own recordings using on-board tags, as described in Goldstein et al. (2025). We have added the appropriate citations to Table 2.

      (34)ย Line 442: How was the threshold level chosen?

      The detection threshold in each level is set to the greater of either 7 dB above the noise level (0 dB-SPL) or the maximal received level minus 70 dB, effectively applying a dynamic range of 70 dB.

      (35)ย Line 445: 100 micros: This is about 3cm. The resolution of PK is about 1cm. For RM it's about 10cm. So, this window is generous for PK, but too strict for RM.

      To keep the model simple and avoid introducing species-specific detection thresholds, we selected a biologically plausible compromise that could reasonably apply to both species. This simplification ensures consistency across simulations while remaining within the known behavioral range.

      (36)ย Line 448: What is the spectrum of the Gaussian noise, and did it change between PK and RM?

      We used the same white Gaussian noise with a flat spectrum across the relevant frequency range (10โ€“80โ€ฏkHz) for both species. We have clarified this in the revised text in lines 570-572.

      (37)ย Line 451: 4 milliseconds is 1.3m. Is this appropriate?

      The 4โ€ฏmilliseconds window was selected based on established auditory masking thresholds described in Mazar & Yovel (2020), and supported by (Popper and Fay, 1995) ch. 2.4.5, ((Blauert, 1997), ย ch. 3.1 and (Mohl and Surlykke, 1989). These values provide conservative lower bounds on batsโ€™ ability to cope with masking (Beleyur and Goerlitz, 2019). For simplicity, we used constant thresholds within each window, see lines 574-576.ย ย 

      (38)ย Line 452: Citation for the forward and backward masking durations?

      See theย  to the previous comment.

      (39)ย Lines 460-461: This is unclear. How does the bat get directional information? The authors claim to be able to measure direction-of-arrival for each detection, but it is not clear how this is done

      As noted in our response to Reviewer 1 (Comment on Line 457), directional information is not computed via an explicit binaural model. Instead, we assume the bat estimates the direction of arrival with an angular error that depends on the SNR, based on established studies (e.g., Simmons et al., 1983; Popper & Fay, 1995). We have clarified this in the revised text in lines 583-592.

      (40)ย Line 467: It seems like the authors are modeling pulse-echo ambiguity, at least in this one alternative model, which is good! However the alternative model doesn't get much attention in the paper. Is there a reason for this?

      We would like to clarify that we did not model pulse-echo. In our confusion model, all echoes received within the IPI are attributed to the batโ€™s most recent call. This includes echoes that may in fact originate from conspecific calls, but the model does not assign self-echoes to earlier pulses or span multiple IPIs. Therefore, while the model captures echo confusion, it does not include true pulse-echo ambiguity. We have clarified this point in the revised text in lines 551-553.

      (41)ย Line 41: "continuous" is more appropriate than "constant".

      Thank you, we have rephrased the text accordingly.

      (42)ย Line 69: "band width" should be one word.

      Thank you, we have corrected it to โ€œbandwidthโ€.

      (43)ย Line 79: "bats" should be in the possessive.

      Thank you, the text has been rephrased.

      (44)ย Line 128: "convoluted" don't you mean "convolved"?

      We have replaced โ€œconvolutedโ€ with the correct term โ€œconvolvedโ€ in the revised text.

      (45)ย Please check your references, as there are some incomplete citations and typos.

      Thank you, we have reviewed and corrected all references for completeness and consistency.

      References

      Beetz, M.J. and Hechavarrรญa, J.C. (2022) โ€˜Neural Processing of Naturalistic Echolocation Signals in Batsโ€™, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Beleyur, T. and Goerlitz, H.R. (2019) โ€˜Modeling active sensing reveals echo detection even in large groups of batsโ€™, Proceedings of the National Academy of Sciences of the United States of America, 116(52), pp. 26662โ€“26668. Available at: https://doi.org/10.1073/pnas.1821722116.

      Betke, M. et al. (2008) โ€˜Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimatedโ€™, Journal of Mammalogy, 89(1), pp. 18โ€“24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Blauert, J. (1997) โ€˜Spatial Hearing: The Psychophysics of Human Sound Localization (rev. ed.)โ€™.

      Boerma, D.B. et al. (2019) โ€˜Wings as inertial appendages: How bats recover from aerial stumblesโ€™, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A. et al. (2013) โ€˜Itโ€™s not black or white-on the range of vision and echolocation in echolocating batsโ€™, Frontiers in Physiology, 4 SEP(September), pp. 1โ€“12. Available at: https://doi.org/10.3389/fphys.2013.00248.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) โ€˜The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulsesโ€™, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) โ€˜Identifying Bats Using Computerized Analysis and Artificial Neural Networksโ€™, North American Symposium on Bat Research, 9.

      Chili, C., Xian, W. and Moss, C.F. (2009) โ€˜Adaptive echolocation behavior in bats for the analysis of auditory scenesโ€™, Journal of Experimental Biology, 212(9), pp. 1392โ€“1404. Available at: https://doi.org/10.1242/jeb.027045.

      Fujioka, E. et al. (2021) โ€˜Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergenceโ€™, Journal of Robotics and Mechatronics, 33(3), pp. 556โ€“563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gillam, E.H. et al. (2010) โ€˜Echolocation behavior of Brazilian free-tailed bats during dense emergence flightsโ€™, Journal of Mammalogy, 91(4), pp. 967โ€“975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldshtein, A. et al. (2025) โ€˜Onboard recordings reveal how bats maneuver under severe acoustic interferenceโ€™, Proceedings of the National Academy of Sciences, 122(14), p. e2407810122. Available at: https://doi.org/10.1073/PNAS.2407810122.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) โ€˜THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4โ€™.

      Hagino, T. et al. (2007) โ€˜Adaptive SONAR sounds by echolocating batsโ€™, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647โ€“651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) โ€˜Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the fieldโ€™, The Journal of the Acoustical Society of America, 124(2), pp. EL51โ€“EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) โ€˜Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid batsโ€™. Available at: https://doi.org/10.1016/j.cub.2024.12.042. Jakobsen, L., Brinklรธv, S. and Surlykke, A. (2013) โ€˜Intensity and directionality of bat echolocation signalsโ€™, Frontiers in Physiology, 4 APR(April), pp. 1โ€“9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) โ€˜Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuitโ€™, 107(31). Available at:

      https://doi.org/10.1073/pnas.1006630107.

      Kalko, E.K. V. (1995) โ€˜Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)โ€™, Animal Behaviour, 50(4), pp. 861โ€“880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) โ€˜ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) โ€™, Journal of Mammalogy, 82(2), pp. 339โ€“351. Available at: https://doi.org/10.1644/15451542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) โ€˜Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar callsโ€™, Ethology, 114(5), pp. 469โ€“ 478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kick, S.A. (1982) โ€˜Target-detection by the echolocating bat, Eptesicus fuscusโ€™, Journal of Comparative Physiology โ–ก A, 145(4), pp. 431โ€“435. Available at: https://doi.org/10.1007/BF00612808/METRICS.

      Kothari, N.B. et al. (2014) โ€˜Timing matters: Sonar call groups facilitate target localization in batsโ€™, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Mohl, B. and Surlykke, A. (1989) โ€˜Detection of sonar signals in the presence of pulses of masking noise by the echolocating bat , Eptesicus fuscusโ€™, pp. 119โ€“124.

      Moss, C.F. and Surlykke, A. (2010) โ€˜Probing the natural scene by echolocation in batsโ€™, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Neretti, N. et al. (2003) โ€˜Time-frequency model for echo-delay resolution in wideband biosonarโ€™, The Journal of the Acoustical Society of America, 113(4), pp. 2137โ€“2145. Available at: https://doi.org/10.1121/1.1554693.

      Popper, A.N. and Fay, R.R. (1995) Hearing by Bats. Springer-Verlag.

      Roy, S. et al. (2019) โ€˜Extracting interactions between flying bat pairs using model-free methodsโ€™, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) โ€˜Technique using thermal infrared-imaging for estimating populations of gray batsโ€™, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) โ€˜A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiverโ€™, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) โ€˜Echolocating bats accumulate information from acoustic snapshots to predict auditory object motionโ€™, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229โ€“29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) โ€˜Evaluation of an auditory model for echo delay accuracy in wideband biosonarโ€™, The Journal of the Acoustical Society of America, 114(3), pp. 1648โ€“ 1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001, undefined (no date) โ€˜Echolocation by insecteating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation โ€™, academic.oup.comHU Schnitzler, EKV KalkoBioscience, 2001โ€ขacademic.oup.com [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) โ€˜The echolocation and hunting behavior of the bat,Pipistrellus kuhliโ€™, Journal of Comparative Physiology A, 161(2), pp. 267โ€“274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. et al. (1983) โ€˜Acuity of horizontal angle discrimination by the echolocating bat , Eptesicus fuscusโ€™. Simmons, J.A. and Kick, S.A. (1983) โ€˜Interception of Flying Insects by Batsโ€™, Neuroethology and Behavioral Physiology, pp. 267โ€“279. Available at: https://doi.org/10.1007/978-3-64269271-0_20.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) โ€˜Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscusโ€™, Journal of Experimental Biology, 212(7), pp. 1011โ€“1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) โ€˜Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flightโ€™, cs-web.bu.edu [Preprint]. Available at: https://csweb.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) โ€˜What the batโ€™s voice tells the batโ€™s brainโ€™, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491โ€“ 8498. Available at: https://doi.org/10.1073/pnas.0703550105. Vanderelst, D. and Peremans, H. (2018) โ€˜Modeling bat prey capture in echolocating batsโ€ฏ: The feasibility of reactive pursuitโ€™, Journal of theoretical biology, 456, pp. 305โ€“314.

      Yovel, Y. et al. (2009) โ€˜The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation callsโ€™, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) โ€˜Bat Navigationโ€™, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333โ€“345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. The Global Positioning System consists of three parts: Earth orbiting satellites, control and monitoring stations across the Earth, and GPS receivers owned by individuals. Multiple sets of 24 satellites are orbiting the Earth every 12 hours while broadcasting their position and time. Ground-based receivers (hand-held GPS devices in watches, phones, cars, airplanes etc) listen to the signals from four or more satellites, comparing the time transmissions of each with its own clock. Given that signal travels at a known speed, the receiver can calculate the distance between the satellites and receiver. Combining the position of the satellite at the time of transmission with the distance, the receiver is able to determine its own location. After the original american GPS, other countries have developed their own versions. Europe's GPS is called Galileo, Russia's is called Glonass, and China's is called Beidou. Modern receivers can use satellites from all these systems simultaneously.

      This is interesting because it shows how GPS isnโ€™t just one system anymore, and itโ€™s a network of different countries satellites working together.

    1. And while modern audiences might prefer that style, that may only be because they align more closely with modern approaches to the craft. Just like those early audiences, itโ€™s all we know. But less naturalistic performances can be just as โ€œgoodโ€ โ€“ emotionally resonant and consistent with the thematic intent of the story โ€“

      naturalism allows the audience to connect more with the film and the actors showing this through there emotions make the film more reliable and realistic.

    1. Reviewer #3 (Public review):

      Summary:

      The present manuscript investigates and proposes different mechanisms for the effects of two therapeutic approaches - cognitive distancing technique and use of antidepressants - on subjective ratings of happiness, confidence, and task engagement, and on the influence of such subjective experiences on choice behavior. Both approaches were found to link to changes in affective state dynamics in a choice task, specifically reduced drift (cognitive distancing) and increased baseline (antidepressant use). Results also suggest that cognitive distancing may reduce the weighing of recent expected values in the happiness model, while antidepressant use may reduce forgetting of choices and outcomes.

      Strengths:

      This is a timely topic and a significant contribution to ongoing efforts to improve our mechanistic understanding of psychopathology and devise effective novel interventions. The relevance of the manuscript's central question is clear, and the links to previous literature and the broader field of computational psychiatry are well established. The modelling approaches are thoughtful and rigorously tested, with appropriate model checks and persuasive evidence that modelling complements the theoretical argument and empirical findings.

      Weaknesses:

      Some vagueness and lack of clarity in theoretical mechanisms and interpretation of results leave outstanding questions regarding (a) the specific links drawn between affective biases, therapies aimed at mitigating them, and mental health function, and (b) the structure and assumptions of the modelling, and how they support the manuscript's central claims. Broadly, I do not fully understand the distinction between how choice behavior vs. affect are impacted separately or together by cognitive distancing. Clarification on this point is needed, possibly through a more explicit proposal of a mechanism (or several alternative mechanisms?) in the introduction and more explicit interpretation of the modelling results in the context of the cyclical choice-affect mechanism.

      (1) Theoretical framework and proposed mechanisms

      The link between affective biases and negative thinking patterns is a bit unclear. The authors seem to make a causal claim that "affective biases are precipitated and maintained by negative thinking patterns", but it is unclear what precisely these negative patterns are; earlier in the same paragraph, they state that affective biases "cause low mood" and possibly shift choices toward those that maintain low mood. So the directionality of the mechanism here is unclear - possibly explaining a bit more of the cyclic nature of this mechanism, and maybe clarifying what "negative thinking patterns" refer to will be helpful.

      More generally, this link between affect and choices, especially given the modelling results later on, should be clarified further. What is the mechanism by which these two impact each other? How do the models of choice and affect ratings in the RL task test this mechanism? I'm not quite sure the paper answers these questions clearly right now.

      The authors also seem to implicitly make the claim that symptoms of mental ill-health are at least in part related to choice behavior. I find this a persuasive claim generally; however, it is understated and undersupported in the introduction, to the point where a reader may need to rely on significant prior knowledge to understand why mitigating the impact of affective biases on choice behavior would make sense as the target of therapeutic interventions. This is a core tenet of the paper, and it would be beneficial to clarify this earlier on.

      It would be helpful to interpret a bit more clearly the findings from 3.4. on decreased drift in all three subjective assessments in the cognitive distancing group. What is the proposed mechanism for this? The discussion mentions that "attenuated declines [...] over time, [add] to our previously reported findings that this psychotherapeutic technique alters aspects of reward learning" - but this is vague and I do not understand, if an explanation for how this happens is offered, what that explanation is. Given the strong correlation of the drift with fatigue, is the explanation that cognitive distancing mitigates affect drift under fatigue? Or is this merely reporting the result without an interpretation around potential mechanisms?

      (Relatedly, aside from possibly explaining the drift parameter, do the fatigue ratings link with choice behavior in any way? Is it possible that the cognitive distancing was helping participants improve choices under fatigue?)

      (2) Task Structure and Modelling

      It is unclear what counted as a "rewarding" vs. "unrewarding" trial in the model. From my understanding of the task description, participants obtained positive or no reward (no losses), and verbal feedback, Correct/Incorrect. But given the probabilistic nature of the task, it follows that even some correct choices likely had unrewarding results. Was the verbal feedback still "Correct" in those cases, but with no points shown? I did not see any discussion on whether it is the #points earned or the verbal feedback that is considered a reward in the model. I am assuming the former, but based on previous literature, likely both play a role; so it would be interesting - and possibly necessary to strengthen the paper's argument - to see a model that assigns value to positive/negative feedback and earned points separately.

      From a theory perspective, it's interesting that the authors chose to assume separate learning rates for rewarding and non-rewarding trials. Why not, for example, separate reward sensitivity parameters? E.g., rather than a scaling parameter on the PE, a parameter modifying the r term inside the PE equation to, perhaps, assign different values to positive and zero points? (While I think overall the math works out similarly at the fitting time, this type of model should be less flexible on scaling the expected value and more flexible on scaling the actual #points / the subjective experience of the obtained verbal feedback, which seems more in line with the theoretical argument made in the introduction). The introduction explicitly states that negative biases "may cause low mood by making outcomes appear less rewarding" - which in modelling equations seems more likely to translate to different reward-perception biases, and not different learning rates. Alternatively, one might incorporate a perseveration parameter (e.g., similar to Collins et al. 2014) that would also accomplish a negative bias. Either of these two mechanisms seems perhaps worth testing out in a model - especially in a model that defines more clearly what rewarding vs. unrewarding may mean to the participant.

      If I understand correctly, the affect ratings models assume that the Q-value and the PE independently impact rating (so they have different weights, w2 and w3), but there is no parameter allowing for different impact for perceived rewarding and unrewarding outcomes? (I may be misreading equations 4-5, but if not, Q-value and PE impact the model via static rather than dynamic parameters.) Given the joint RL-affect fit, this seems to carry the assumption that any perceptual processing differences leading to different subjective perceptions of reward associated with each outcome only impact choice behavior, but not affect? (whereas affect is more broadly impacted, if I'm understanding this correctly, just by the magnitude of the values and PEs?) This is an interesting assumption, and the authors seem to have tested it a bit more in the Supplementary material, as shown in Figure S4. I'm wondering why this was excluded from the main text - it seems like the more flexible model found some potentially interesting differences which may be worth including, especially as they might shed additional insight into the influence of cognitive distancing on the cyclical choice-affect mechanisms proposed.

      Minor comments:

      If fatigue ratings were strongly associated with drift in the best-fitting model (as per page 13), I wonder if it would make sense to use those fatigue ratings as a proxy rather than allow the parameter to vary freely? (This does not in any way detract from the winning model's explanatory power, but if a parameter seems to be strongly explained by a variable we have empirical data for, it's not clear what extra benefit is earned by having that parameter in the model).

    1. MLB officials said the whole process took an average of 17 seconds in Class AAA games last year

      I think AI in sports will have a negative impact because it takes away the human side of the game that makes it exciting. The Washington Post said that MLBโ€™s new ball-strike challenge system takes about โ€œ17 seconds,โ€ and even though thatโ€™s quick, it still slows things down and changes the flow. I believe that rational decisions are a part of the game. I also donโ€™t think other sports, like the NFL, should bring in AI to fix human errors, because the mistakes and arguments are part of what makes sports fun to watch. If AI gets too involved, games could start to feel robotic and less real. When it comes to jobs, I believe AI will start making a big impact before 2030. I also think people underestimate how many jobs AI can do, because itโ€™s not just in tech, it could take over customer service, driving, and even some parts of healthcare.

    1. CBT focuses on the knowledge and appraisal processes that are involved in excessive or insufficient emotional arousal. Disturbed emotional arousal or dysfunctional behaviors are hypothesized to occur because of some absent, erroneous, dysfunctional, incorrect, exaggerated, or extremely overevaluative appraisal of environmental threats or rigid reactions that one must behave in a certain way. CBT proposes that practitioners focus on both events and beliefs that are likely to arouse emotions. This emphasis on the information that people extract from the environment to ensure survival and adaptation is the key focus of CBT.

      I didnโ€™t realize how much CBT focuses on helping people interpret whatโ€™s happening around them, not just change in their thought process. This makes me think that emotions necessarily arenโ€™t the problem, more so how we appraise the situations. Itโ€™s interesting that distorted thinking can cause either too much or too little emotional reaction. Iโ€™m starting to see how CBT is more about building self-awareness and flexibility than just โ€œfixingโ€ thoughts.

    1. Hac: IS THE BEGINNING OF THE DEFEAT AND DESTRUCTION OF- THE DAY OF SEVEN Macaw by the two boys, the first named Hunahpuand the second named Xbalanque. Being gods, the two of them saw evilin his attempt at self-magnification before the Heart of Sky. So the boystalked:โ€œItโ€™s no good without life, without people here on the face of the earth.โ€โ€œWell then, letโ€™s try a shot. We could shoot him while heโ€™s at his meal.We could make him ill, then put an end to his riches, his jade, his metal,his jewels, his gems, the source of his brilliance. Everyone might do ashe does, but it should not come to be that fiery splendor is merely amatter of metal. So be it,โ€ said the boys, each one with a blowgun on hisshoulder, the two of them together.And this Seven Macaw has two sons: the first of these is Zipacna, andthe second is the Earthquake. And Chimalmat is the name of theirmother, the wife of Seven Macaw.And this is Zipacna, this is the one to build up the great mountains:Fireplace, Hunahpu, Cave by the Water, Xcanul, Macamob, Huliznab,as the names of the mountains that were there at the dawn are spoken.They were brought forth by Zipacna in a single night.And now this is the Earthquake. The mountains are moved by him;the mountains, small and great, are softened by him. The sons of SevenMacaw did this just as a means of self-magnification.โ€œHere am I: I am the sun,โ€ said Seven Macaw.โ€œHere am I: I am the maker of the earth,โ€ said Zipacna.

      The boys are determined to kill their father, Seven Macaw. They appear to be full of themselves and believe they are more powerful and superior than the elders. Greed and selfishness have taken over their minds, possibly leading to delusions of grandeur.

    1. Discuss the various perspectives on how and why people become leaders. Compare and contrast various leadership styles. Discuss the types of power that a leader may tap into.

      People step into leadership roles for all sorts of reasons some are just naturally inclined, others pick up the skills over time, and sometimes itโ€™s the situation that pushes someone to lead. There are different leadership styles too some leaders like to take charge and make all the decisions, some prefer to get everyone involved, and others give people a lot of freedom to do their own thing. Leaders also rely on different kinds of power, like the authority their position gives them, the knowledge they have, or simply the trust and respect others feel toward them. Different styles and sources of power fit better depending on the group and the challenges they face. Understanding these differences helps explain why certain leaders click with their teams while others donโ€™t.

    1. Identify appropriate methods for conducting college-level research. Distinguish among various types of sources. Evaluate the credibility of sources. Identify various types of supporting material. Employ visual aids that enhance a speakerโ€™s message.

      When youโ€™re doing college research, itโ€™s smart to stick with things like academic databases, books, and articles that have been reviewed by experts. There are different types of sources some are original materials, others analyze those, and some just summarize info. To figure out if a source is trustworthy, look at who wrote it, how recent it is, and whether itโ€™s backed up with facts. Using stuff like examples, stats, quotes, or stories can really help make your points stronger. And adding visual aids like slides or charts can make what youโ€™re saying easier to follow and more interesting.

    1. Employ audience analysis. Determine the general purpose of a speech. List strategies for narrowing a speech topic. Compose an audience-centered, specific purpose statement for a speech. Compose a thesis statement that summarizes the central idea of a speech.

      Audience analysis is all about understanding who youโ€™re talking to so your message hits home. The general purpose of a speech is basically why youโ€™re giving itโ€”whether itโ€™s to inform, persuade, or entertain. To narrow down your topic, you can zoom in on a smaller piece or find a unique angle that fits your listeners. Your specific purpose statement is what you want your audience to learn or do by the end. And your thesis is just a quick summary of the main point you want to get across.

    1. Identify strategies for improving listening competence at each stage of the listening process. Summarize the characteristics of active listening. Apply critical-listening skills in interpersonal, educational, and mediated contexts. Practice empathetic listening skills. Discuss ways to improve listening competence in relational, professional, and cultural contexts.

      Being a better listener starts with really tuning in pay attention, try to understand whatโ€™s being said, think it over, respond clearly, and remember it later. Active listening means you're fully present, showing with your body language and responses that you're actually engaged. Critical listening is useful in everyday conversations, school, or even when you're watching the newsโ€”itโ€™s about thinking deeper and not just taking everything at face value. Empathetic listening is more about just being there for someone, letting them talk, and trying to truly understand how they feel without rushing to fix it. Whether you're at work, in a relationship, or talking with people from different backgrounds, listening with respect and intention goes a long way.

    1. Identify and discuss the four main types of linguistic expressions. Discuss the power of language to express our identities, affect our credibility, control others, and perform actions. Discuss some of the sources of fun within language. Explain how neologisms and slang contribute to the dynamic nature of language. Identify the ways in which language can separate people and bring them together.

      Linguistic expressions come in different forms words, phrases, sentences, and whole conversations. Language is a way we show who we are, earn trust, influence people, and even do things like make promises just by talking. Itโ€™s also a lot of fun, with jokes, puns, and playful ways of using words that keep things interesting. New slang and made-up words keep language fresh and always changing. Sometimes language can make people feel left out, but it also has the power to bring us together and create a sense of belonging.

    1. Discuss how communication is integrated in various aspects of your life. Explain how communication meets physical, instrumental, relational, and identity needs. Explain how the notion of a โ€œprocessโ€ fits into communication. Discuss the ways in which communication is guided by culture and context.

      Communication plays a huge role in my everyday life. Itโ€™s how I connect with people, get things done, and express what Iโ€™m thinking or feeling. Whether Iโ€™m having a conversation with a friend, asking a question in class, or just texting someone, Iโ€™m using communication in one form or another. It also meets a lot of basic needs. It helps with physical and emotional health by making us feel connected to others. It helps with practical, everyday things like asking for help or making plans. Itโ€™s key to building and maintaining relationships, and it also shapes how we see ourselves and how others see us. Communication isnโ€™t just something that happens onceโ€”itโ€™s a process. Itโ€™s ongoing, and it changes depending on who you're talking to, whatโ€™s going on, and where you are. How we say things can be just as important as what weโ€™re saying. Culture and context also make a big difference. The way we communicate depends a lot on our background, our values, and the situation weโ€™re in. Something that feels normal in one setting might not be in another, so itโ€™s important to be aware of that and adjust when needed.

    1. There's some additionally complexity because of something called the "lexer hack". Essentially, when parsing C you want to know if something is a type name or variable name (because that context matters for compiling certain expressions), but there's no syntactic distinction between them: int int_t = 0; is perfectly valid C, as is typedef int int_t; int_t x = 0;. To know if an arbitrary token int_t is a type name or a variable name, we need to feed type information from the parsing/codegen stage back into the lexer. This is a giant pain for regular compilers that want to keep their lexer, parser, and codegen modules pure and plantonically separate, but it's actually not very hard for us! I'll explain it more when we get to the typedef section, but basically we just keep types: set[str] in Lexer, and when lexing, check if a token is in that set before giving it a token kind

      It's strange how much appeal "the lexer hack" has for being such a bad solution to the problem.

      The most reasonable thing is to just not care about whether your lexer can distinguish between whether an identifier refers to a type or to another sort of identifier. Just report that it's an identifier. In practice, this doesn't very much change how you have to implement the parser, and the symbol table can remain local to the higher-level parser machinery where it was always going to be anyway.

      The lexer hack sucks.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors develop a complete integral drive system in Anopheles gambiae malaria mosquitoes. This type of gene drive is interesting, with special advantages and disadvantages compared to more common designs. Here, the authors develop the Cas9 element and combine it with a previously developed antimalaria effector element. The new element performs very well in terms of drive efficiency, but it has unintended fitness costs, and a higher than desirable rate of functional resistance allele formation. Nevertheless, this study represents a very good step forward toward developing effective gene drives and is thus of high impact.

      The format of the manuscript is a bit suboptimal for review. Please add line numbers next time for easy reference. It would also help to have spaces between paragraphs and to have figures (with legends) added to the text where they first appear.

      It might be useful to add subsections to the results, just like in the methods section. It could even be expanded a bit with some specific parts from the discussion, through this is optional.

      Abstract: The text says: "As a minimal genetic modification, nanosd does not induce widespread transcriptomic perturbations." However, it does seem to change things based on Figure 3c.

      Page 2: "drive technologies for public health and pest control applications" needs a period afterward.

      Page 2: "The fitness costs, homing efficiency, and resistance rate of the gene drive is" should be "The fitness costs, homing efficiency, and resistance rate of the gene drive are".

      Page 2: "When they target important mosquito genes, gene drives are designed to ensure that the nuclease activity window (germline) does not overlap with that of the target gene (somatic)." is note quite correct. This is, of course, sensible for suppression drives, but it's not a necessary requirement for modification drives with rescue elements in many situations.

      Page 2: "recessive somatic fitness cost phenotypes" is unclear. I think that you are trying to avoid the recessive fitness cost of null alleles becoming a dominant fitness cost from a gene drive allele (in drive-wild-type heterozygotes).

      Page 2: "This optimization approach has had only limited success, and suboptimal performance is commonly attributed to not capturing all the regulatory elements specific to the germline gene's expression9,12". I don't think this is correct. There are several examples where a new promoter helped a lot. The zpg promoter in Anopheles gambiae allowed success at the dsx site in suppression cage studies (Kyrou et al 2018), and nanos gave big improvement to modification drives at the cardinal locus (Carballer et al 2020). In flies, several promoters were tested, and one allowed success in cage experiments (Du et al 2024). In Aedes, the shu promoter allowed for high drive performance (Anderson et al 2023), though this last one hasn't been tested in more difficult situations. I think you could certainly argue in the general case that not all promoters will work the way their transcriptome says, but there are many examples where they seem to be pretty good.

      Page 2: "make it more likely that mutations that disrupt the drive components are selected against though loss of function of the host gene." I think that this needs a bit more explanation. You are referring to mutations in regulatory elements or frameshift mutations. This will make it more resistant to mutation. Also, these mutations would tend to have a minor effect expect perhaps in the cargo gene of a modification drive. By using a cargo gene in an integral drive, you could still keep it somewhat safer, but whether this is 1.2x or 10x safer is unclear.

      Page 3: "they can incur severe unintended fitness costs". This is central to integral drives and this manuscript. It's worth elaborating on.

      Page 3: "Regulatory elements from germline genes that have worked sub-optimally in traditional gene drive designs for the reasons outlined above may work well in an IDG design20." This is setting up the integral drive with nanos, but nanos DOES work well in traditional Anopheles gambiae gene drive designs. It is possible that you might end up with less somatic expression than Hammond et al 2020 (though the comparison is unclear due to batch effects in that study), but there is no direct comparison in this manuscript to that.

      Page 3: "This suggests an impact of maternal deposition on drive efficiency only in female drive carriers." This is quite strange. Usually, I would expect to see an equal reduction in efficiency between male and female progeny. Could this be due to limited sample size? Random idea: It's also possible that almost all maternal deposition was mosaic and wouldn't be enough to direct affect drive conversion. However, it could cause enough of a fitness cost TOGETHER with new drive expression in females that perhaps only tissues with randomly low expression rates properly developed and led to progeny, reducing drive inheritance? Another possibility: Could the drive/resistance males have impaired fertility, so these ones are underrepresented in the batch cross? If nanos is needed in males and a single drive copy is not quite enough for good fertility or mating competitiveness, they may be underrepresented in your crosses. They might have worse fertility than drive homozygous males, which at least have two partially working copies of nanos rather than just one (in many cells, at least). Maybe check the testis for abnormal phenotypes?

      Overall, it would be favorable if the drive allele was somewhere more fit than a nonfunctional resistance allele. This could already be achieved in this drive, but it doesn't get much mention.

      Page 3: There should be a comma after, "Interestingly, while many of the observed mutations were predicted to abolish nanos expression" and "This could indicate that in these experiments".

      Page 3 last sentence: Please improve the clarity.

      Removing the EGFP is supposed to restore the fitness, and this was helpful in some previous integral drive constructs. This could get a bit more mention (it is possible that I missed this somewhere in the manuscript).

      Page 4: The MM-CP line and it's association with the integral drive strategy could get a little more introduction. Maybe even a supplemental figure showing the general idea.

      Page 5: "cassette is predicted to disrupt the CP function entirely (Fig. 5d)" also lacks a period.

      Page 5: "The subsequent stabilization of the nanosd frequency and the lack of rapid loss suggests that any associated fitness cost is primarily recessive." This is not quite correct because by this point, drive/wild-type heterozygotes are rare, and this is where you'd find a potential dominant fitness cost. It should be correct in the end stages where it is a mix of drive and functional/nonfunctional resistance alleles (though the nonfunctional resistance alleles may cause greater fitness costs when together with a drive - see above).

      Page 6: "Maternal deposition of Cas9, or Cas9;gRNA, into the zygote can lead to cutting at stages when homing is not favoured, and has been commonly observed for canonical Anopheles nanos drives9,10,35." Reference 35 (which is more suitable for referencing an example of nanos in other Anopheles) found some resistance alleles by deep sequencing, but the timing that they formed was unclear (it's not certain if it was maternal deposition). This study may be a more suitable reference: Carballar-Lejarazú R, Tushar T, Pham TB, James AA. Cas9-mediated maternal-effect and derived resistance alleles in a gene-drive strain of the African malaria vector mosquito, Anopheles gambiae. Genetics, 2022.

      Page 8: "could further reduce the likelihood of resistance allele formation by increasing the frequency of HDR events." Multiple gRNAs would mostly help by reducing functional resistance allele formation, especially since drive conversion is already very high in Anopheles.

      Page 8, last paragraph: This conclusion is perhaps a little optimistic considering the functional resistance alleles, which should get a little more attention in the summary or elsewhere in the discussion section.

      Figure 1d: The vertical text saying "Non-WT" is confusing. The circles themselves show + and -. Also, "-" isn't necessarily a knockout allele, so I'm not sure if - is the best symbol for resistance.

      Figure 2e: The vertical scale is not the most intuitive. Consider rearranging it to "Transition from larvae to pupae" starting at zero and going to 1 when all the larvae become pupae.

      Figure 2e-f: For both of these, there are clear differences between males and females. Thus, when comparing drive homozygotes to wild-type, it would probably be better to have separate statistical comparisons for males and females.

      Figure 3: Can any of these transcription results in individual genes potentially explain the observed fitness cost?

      Figure 3b: The scale here also doesn't quite make sense. It's the fraction of underdeveloped ovaries, but the graph is also perhaps trying to show whether just 1-2 ovaries are present, or maybe how many ovaries are undeveloped, but then it would say "zero"? This should be clarified. Number of ovaries and how well-developed they are is separate (it can be put on the same graph, but needs to be more clear).

      Figure 4f: The vertical axis should say "ONNV."

      Figure 5c-d: These should be labeled as the most common resistance allele. Also, I'm not sure how relevant it is, but we also found an alternate start codon here: Hou S, Chen J, Feng R, Xu X, Liang N, Champer J. A homing rescue gene drive with multiplexed gRNAs reaches high frequency in cage populations but generates functional resistance. J Genet Genomics, 2024. Maybe this is a more common problem than one would expect?

      Figure 5cd,S4,S5: They have a bit of a weird plot. Why not make four line graphs for each? Also, some alleles use the  symbol. + is wild-type, which is well understood, but - as resistance is not always clear, and seeing them together may confuse readers. Additionally, the fact that you have the most common resistance allele in Figure 5cd might mean that you know more about the genotype? If so, it would be best to separate wild-type and resistance alleles in whatever the final figure looks like.

      Some supplemental raw data files would be useful if they were available, but the figures are through enough that this isn't essential.

      Review by:

      Jackson Champer, with major assistance from Ruobing Feng (essentially section B) and Jie Du

      Referee cross-commenting

      We don't have any cross-comments, other than supporting the idea of slightly more comparisons to the authors' previous construct.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      A key innovation of the nanosd gene drive is its integral gene drive (IGD) design, which inserts the drive cassette directly into the A. gambiae nanos gene, incorporating only the minimal components necessary for drive function. The drive achieves high transmission rates, without causing widespread disruption of gene expression or increasing susceptibility to malaria parasites, and imposes an acceptable fitness cost-primarily a reduction in female fecundity when homozygous. The strong performance of nanosd can be attributed to its design: Cas9 is expressed in the correct cells and timing to induce efficient homing, effectively hijacking the nanos gene's natural expression profile. However, despite the careful design aimed at preserving nanos function, the rescue was incomplete: homozygous female drive carriers exhibited a clear reduction in ovarian function.

      In caged population trials, both the drive and a co-introduced anti-malaria effector gene reached high frequencies, even in the presence of emerging resistance alleles. Because the drive is inserted into an essential gene, nonfunctional resistance alleles are selected against and tend to be purged over time. Nonetheless, functional resistance remains a concern. The use of a single, though precisely positioned gRNA targeting the native nanos gene ATG site increases the likelihood of generating functional resistance alleles. Over the long term, if the drive imposes fitness costs, it may be outcompeted by such functional resistance alleles, potentially undermining the goal of sustained population modification.

      Overall, this study represent a notable advance in Anopheles mosquito gene drive development and can be considered as high impact. - Place the work in the context of the existing literature (provide references, where appropriate).

      Previous IGD efforts in Drosophila, mice and mosquitoes have demonstrated nearly super‐Mendelian inheritance but often at the expense of host fitness. For example, Nash et al. built an intronic‐gRNA Cas9 drive at the D. melanogaster rcd-1r locus that propagated efficiently through cage populations (Nash et al., 2022), and Gonzalez et al. reported that a Cas9 drive inserted at the germline zpg locus in Anopheles stephensi biased inheritance by ~99.8% (Gonzalez et al., 2025). However, these strong drives disrupted essential genes: in A. gambiae, inserting Cas9 into zpg produced efficient homing but rendered females largely sterile (Ellis et al., 2022). A similar germline Cas9 knock-in in Mus musculus enabled gene conversion in both sexes, albeit with only modest efficiency and potential fitness trade-offs (Weitzel et al., 2021). The current nanosd IGD is explicitly designed to overcome this limitation by selecting a more permissive gene target and using a minimal drive cassette, so as to preserve mosquito viability while maintaining robust drive efficiency, although still with reduced female drive homozygotes fertility.

      This nanosd gene drive like previous homing drives in Anopheles, is capable of achieving a high level of inheritance bias. Although it uses the endogenous nanos regulatory elements, which have less leaky somatic expression compared to using vasa (Gantz et al., 2015; Hammond et al., 2016; Hammond et al., 2017) or zpg promoters(Hammond et al., 2021; Kyrou et al., 2018), to drive Cas9 expression and thereby reduces somatic expression-induced female sterility, the incomplete rescue of nanos function still leads to reduced female fertility in drive homozygotes. - State what audience might be interested in and influenced by the reported findings.

      It's worth noting the broad audience that will find this work relevant. Gene drive developers and molecular geneticists will be impressed by the good drive performance and directly influenced by the design principles showcased here. The study's integral gene drive architecture that leverages the endogenous nanos regulatory elements, in-frame E2A peptide linkage for co-expression, and intronic insertion of gRNA and selectable markers addresses long-standing challenges in promoter leakage, somatic fitness costs, and resistance allele evolution. What's more, vector biologists and malaria researchers will be interested in the successful deployment of a gene drive in A. gambiae that actually carries a disease-blocking trait. - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      We have worked on CRISPR gene drive development in both fruit flies and Anopheles mosquitoes and have experience with modeling their spread.

      References

      Ellis, D.A., Avraam, G., Hoermann, A., Wyer, C.A.S., Ong, Y.X., Christophides, G.K., and Windbichler, N. (2022). Testing non-autonomous antimalarial gene drive effectors using self-eliminating drivers in the African mosquito vector Anopheles gambiae. PLOS Genetics 18, e1010244-e1010244.

      Gantz, V.M., Jasinskiene, N., Tatarenkova, O., Fazekas, A., Macias, V.M., Bier, E., and James, A.A. (2015). Highly efficient Cas9-mediated gene drive for population modification of the malaria vector mosquito Anopheles stephensi. Proc Natl Acad Sci U S A 112, E6736-E6743.

      Gonzalez, E., Anderson, M.A.E., Ang, J.X.D., Nevard, K., Shackleford, L., Larrosa-Godall, M., Leftwich, P.T., and Alphey, L. (2025). Optimization of SgRNA expression with RNA pol III regulatory elements in Anopheles stephensi. Scientific Reports 15, 13408.

      Hammond, A., Galizi, R., Kyrou, K., Simoni, A., Siniscalchi, C., Katsanos, D., Gribble, M., Baker, D., Marois, E., Russell, S., et al. (2016). A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat Biotechnol 34, 78-83.

      Hammond, A., Karlsson, X., Morianou, I., Kyrou, K., Beaghton, A., Gribble, M., Kranjc, N., Galizi, R., Burt, A., Crisanti, A., et al. (2021). Regulating the expression of gene drives is key to increasing their invasive potential and the mitigation of resistance. PLOS Genetics 17, e1009321-e1009321.

      Hammond, A.M., Kyrou, K., Bruttini, M., North, A., Galizi, R., Karlsson, X., Kranjc, N., Carpi, F.M., D'Aurizio, R., Crisanti, A., et al. (2017). The creation and selection of mutations resistant to a gene drive over multiple generations in the malaria mosquito. PLOS Genetics 13, e1007039-e1007039.

      Kyrou, K., Hammond, A.M., Galizi, R., Kranjc, N., Burt, A., Beaghton, A.K., Nolan, T., and Crisanti, A. (2018). A CRISPR-Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nature Biotechnology 36, 1062-1066.

      Nash, A., Capriotti, P., Hoermann, A., Papathanos, P.A., and Windbichler, N. (2022). Intronic gRNAs for the construction of minimal gene drive systems. Frontiers in Bioengineering and Biotechnology 0, 570-570. Weitzel, A.J., Grunwald, H.A., Ceri, W., Levina, R., Gantz, V.M., Hedrick, S.M., Bier, E., and Cooper, K.L. (2021). Meiotic Cas9 expression mediates gene conversion in the male and female mouse germline. Plos Biol 19, e3001478-e3001478.

    1. hat many of us findtelevision very difficult to switch off; that again and again, evenwhen we have switched on for a particular โ€˜programmeโ€™, wefind ourselves watching the one after it and the one after that

      Itโ€™s true that once we start watching, itโ€™s hard to stop. This happens with social media too, where feeds and algorithm keep us scrolling. It makes you wonder how much weโ€™re actually choosing what to watch, and how much weโ€™re just getting pulled along by a flow thatโ€™s designed to keep us glued to the screen.

    2. For the fact is that many of us do sit there, and much of thecritical significance of television must be related to this fact

      Williams emphasizes that much of TVโ€™s impact comes from viewers simply sitting and watching. Flow isnโ€™t just a theory, itโ€™s created by actual habits, showing how attention and cultural experience are shaped through continuous viewing.

    1.   ำง >4  *ำง 4{ำง >ำง 4ำง !>*ำง k ำง 4>*ำง

      I think it is more of the societal expectation that reject the word play for adults. Play does have very childish connotations. For example, instead of role play for kids, you have acting for actors. Itโ€™s the same thing, just a different label.

    1. Never impose on others what you would not choose for yourself.โ€ (Analectsย XV.24)
      1. I feel like this is a saying we still use today. As I am sure we have all heard, "Treat others how you would like to be treated." It's just interesting to me that such a small saying could have such deep and great meaning that is still around today.
    2. The Chinese economy produced one quarter of the worldโ€™s gross domestic product (GDP) in 1500, followed by India which produced nearly another quarter. In comparison, the fourteen nations of western Europe produced just about half of Chinaโ€™s GDP or only one-eight of the global total production. The largest European economy, in Italy, produced only about one-sixth of Chinaโ€™s output.

      I didnโ€™t realize China and India were that dominant in 1500. It flips the way I usually think about history, because weโ€™re often told Europe was the center of progress. Itโ€™s kind of shocking that all of western Europe together only made half of what China alone produced.

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):ย 

      Summary:ย 

      McDougal et al. aimed to characterize the antiviral activity of mammalian IFIT1 orthologs. They first performed three different evolutionary selection analyses within each major mammalian clade and identified some overlapping positive selection sites in IFIT1. They found that one site that is positively selected in primates is in the RNA-binding exit tunnel of IFIT1 and is tolerant of mutations to amino acids with similar biochemical properties. They then tested 9 diverse mammalian IFIT1 proteins against VEEV, VSV, PIV3, and SINV and found that each ortholog has distinct antiviral activities. Lastly, they compared human and chimpanzee IFIT1 and found that the determinant of their differential anti-VEEV activity may be partly attributed to their ability to bind Cap0 RNA.ย 

      Strengths:ย 

      The study is one of the first to test the antiviral activity of IFIT1 from diverse mammalian clades against VEEV, VSV, PIV3, and SINV. Cloning and expressing these 39 IFIT1 orthologs in addition to single and combinatorial mutants is not a trivial task. The positive connection between anti-VEEV activity and Cap0 RNA binding is interesting, suggesting that differences in RNA binding may explain differences in antiviral activity.ย 

      Weaknesses:ย 

      The evolutionary selection analyses yielded interesting results, but were not used to inform follow-up studies except for a positively selected site identified in primates. Since positive selection is one of the two major angles the authors proposed to investigate mammalian IFIT1 orthologs with, they should integrate the positive selection results with the rest of the paper more seamlessly, such as discussing the positive selection results and their implications, rather than just pointing out that positively selected sites were identified. The paper should elaborate on how the positive selection analyses PAML, FUBAR, and MEME complement one another to explain why the tests gave them different results. Interestingly, MEME which usually provides more sites did not identify site 193 in primates that was identified by both PAML and FUBAR. The authors should also provide the rationale for choosing to focus on the 3 sites identified in primates only. One of those sites, 193, was also found to be positively selected in bats, although the authors did not discuss or integrate that finding into the study. In Figure 1A, they also showed a dN/dS < 1 from PAML, which is confusing and would suggest negative selection instead of positive selection. Importantly, since the authors focused on the rapidly evolving site 193 in primates, they should test the IFIT1 orthologs against viruses that are known to infect primates to directly investigate the impact of the evolutionary arms race at this site on IFIT1 function.ย 

      We thank the reviewer for their assessment and for acknowledging the breadth of our dataset regarding diverse IFIT1s, number of viruses tested, and the functional data that may correlate biochemical properties of IFIT1 orthologous proteins with antiviral function. We have expanded the introduction and results sections to better explain and distinguish between PAML, FUBAR, and MEME analyses. Furthermore, we have expanded the discussion to incorporate the observation that site 193 is rapidly evolving in bats, as well as the observation that nearby sites to the TPR4 loop were identified as rapidly evolving in all clades of mammals tested. We also do observe an overall gene dN/dS of <1, however this is simply the average across all codons of the entire gene and does not rule out positive selection at specific sites. This is observed for other restriction factors, as many domains are undergoing purifying selection to retain core functions (e.g enzymatic function, structural integrity) while other domains (e.g. interfaces with viral antagonists or viral proteins) show strong positive selection. Specific examples include the restriction factors BST-2/Tetherin (PMID: 19461879) and MxA (PMID: 23084925). Furthermore, we agree that testing more IFIT1-sensitive viruses that naturally infect primates with our IFIT1 193 mutagenesis library would shed light on the influence of host-virus arms races at this site. However, VEEV naturally does also infect humans as well as at least one other species of primate (PMID: 39983680).

      Below we individually address the reviewers' claims of inaccurate data interpretation.

      Some of the data interpretation is not accurate. For example:ย 

      (1) Lines 232-234: "...western blot analysis revealed that the expression of IFIT1 orthologs was relatively uniform, except for the higher expression of orca IFIT1 and notably lower expression of pangolin IFIT1 (Figure 4B)." In fact, most of the orthologs are not expressed in a "relatively uniform" manner e.g. big brown bat vs. shrew are quite different.ย 

      We have now included quantification of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). We have also removed the phrase โ€œrelatively uniformโ€ from the text and have instead included text describing the quantified expression differences.

      (2) Line 245: "...mammalian IFIT1 species-specific differences in viral suppression are largely independent of expression differences." While it is true that there is no correlation between protein expression and antiviral activity in each species, the authors cannot definitively conclude that the species-specific differences are independent of expression differences. Since the orthologs are clearly not expressed in the same amounts, it is impossible to fully assess their true antiviral activity. At the very least, the authors should acknowledge that the protein expression can affect antiviral activity. They should also consider quantifying the IFIT1 protein bands and normalizing each to GAPDH for readers to better compare protein expression and antiviral activity. The same issue is in Line 267.ย 

      We have now included quantification and normalization of the western blots to allow the reader to compare infection results with the infection data (Updated Figure 4B and 4G). Furthermore, we acknowledge in the text that expression differences may affect antiviral potency in infection experiments.

      (3)ย Line 263: "SINV... was modestly suppressed by pangolin, sheep, and chinchilla IFIT1 (Figure 4E)..." The term "modestly suppressed" does not seem fitting if there is 60-70% infection in cells expressing pangolin and chinchilla IFIT1.ย 

      We have modified the text to say โ€œsignificantly suppressedโ€ rather than โ€œmodestly suppressed.โ€

      (4)ย The study can be significantly improved if the authors can find a thread to connect each piece of data together, so the readers can form a cohesive story about mammalian IFIT1.ย 

      We appreciate the reviewerโ€™s suggestion and have tried to make the story including more cohesive through commentary on positive selection and by using the computational analysis to first inform potential evolutionary consequences of IFIT1 functionality first by an intraspecies (human) approach, and then later an interspecies approach with diverse mammals that have great sequence diversity. Furthermore, we point out that almost all IFIT1s tested in the ortholog screen were also included in our computational analysis allowing for the potential to connect functional observations with those seen in the evolutionary analyses.

      Reviewer #2 (Public review):ย 

      McDougal et al. describe the surprising finding that IFIT1 proteins from different mammalian species inhibit the replication of different viruses, indicating that the evolution of IFIT1 across mammals has resulted in host speciesspecific antiviral specificity. Before this work, research into the antiviral activity and specificity of IFIT1 had mostly focused on the human ortholog, which was described to inhibit viruses including vesicular stomatitis virus (VSV) and Venezuelan equine encephalitis virus (VEEV) but not other viruses including Sindbis virus (SINV) and parainfluenza virus type 3 (PIV3). In the current work, the authors first perform evolutionary analyses on IFIT1 genes across a wide range of mammalian species and reveal that IFIT1 genes have evolved under positive selection in primates, bats, carnivores, and ungulates. Based on these data, they hypothesize that IFIT1 proteins from these diverse mammalian groups may show distinct antiviral specificities against a panel of viruses. By generating human cells that express IFIT1 proteins from different mammalian species, the authors show a wide range of antiviral activities of mammalian IFIT1s. Most strikingly, they find several IFIT1 proteins that have completely different antiviral specificities relative to human IFIT1, including IFIT1s that fail to inhibit VSV or VEEV, but strongly inhibit PIV3 or SINV. These results indicate that there is potential for IFIT1 to inhibit a much wider range of viruses than human IFIT1 inhibits. Electrophoretic mobility shift assays (EMSAs) suggest that some of these changes in antiviral specificity can be ascribed to changes in the direct binding of viral RNAs. Interestingly, they also find that chimpanzee IFIT1, which is >98% identical to human IFIT1, fails to inhibit any tested virus. Replacing three residues from chimpanzee IFIT1 with those from human IFIT1, one of which has evolved under positive selection in primates, restores activity to chimpanzee IFIT1. Together, these data reveal a vast diversity of IFIT1 antiviral specificity encoded by mammals, consistent with an IFIT1-virus evolutionary "arms race".ย 

      Overall, this is a very interesting and well-written manuscript that combines evolutionary and functional approaches to provide new insight into IFIT1 antiviral activity and species-specific antiviral immunity. The conclusion that IFIT1 genes in several mammalian lineages are evolving under positive selection is supported by the data, although there are some important analyses that need to be done to remove any confounding effects from gene recombination that has previously been described between IFIT1 and its paralog IFIT1B. The virology results, which convincingly show that IFIT1s from different species have distinct antiviral specificity, are the most surprising and exciting part of the paper. As such, this paper will be interesting for researchers studying mechanisms of innate antiviral immunity, as well as those interested in species-specific antiviral immunity. Moreover, it may prompt others to test a wide range of orthologs of antiviral factors beyond those from humans or mice, which could further the concept of host-specific innate antiviral specificity. Additional areas for improvement, which are mostly to clarify the presentation of data and conclusions, are described below.ย 

      Strengths:ย 

      (1) This paper is a very strong demonstration of the concept that orthologous innate immune proteins can evolve distinct antiviral specificities. Specifically, the authors show that IFIT1 proteins from different mammalian species are able to inhibit the replication of distinct groups of viruses, which is most clearly illustrated in Figure 4G. This is an unexpected finding, as the mechanism by which IFIT1 inhibits viral replication was assumed to be similar across orthologs. While the molecular basis for these differences remains unresolved, this is a clear indication that IFIT1 evolution functionally impacts host-specific antiviral immunity and that IFIT1 has the potential to inhibit a much wider range of viruses than previously described.ย 

      (2) By revealing these differences in antiviral specificity across IFIT1 orthologs, the authors highlight the importance of sampling antiviral proteins from different mammalian species to understand what functions are conserved and what functions are lineage- or species-specific. These results might therefore prompt similar investigations with other antiviral proteins, which could reveal a previously undiscovered diversity of specificities for other antiviral immunity proteins.ย 

      (3) The authors also surprisingly reveal that chimpanzee IFIT1 shows no antiviral activity against any tested virus despite only differing from human IFIT1 by eight amino acids. By mapping this loss of function to three residues on one helix of the protein, the authors shed new light on a region of the protein with no previously known function.ย 

      (4) Combined with evolutionary analyses that indicate that IFIT1 genes are evolving under positive selection in several mammalian groups, these functional data indicate that IFIT1 is engaged in an evolutionary "arms race" with viruses, which results in distinct antiviral specificities of IFIT1 proteins from different species.ย 

      Weaknesses:ย 

      (1)ย The evolutionary analyses the authors perform appear to indicate that IFIT1 genes in several mammalian groups have evolved under positive selection. However, IFIT1 has previously been shown to have undergone recurrent instances of recombination with the paralogous IFIT1B, which can confound positive selection analyses such as the ones the authors perform. The authors should analyze their alignments for evidence of recombination using a tool such as GARD (in the same HyPhy package along with MEME and FUBAR). Detection of recombination in these alignments would invalidate their positive selection inferences, in which case the authors need to either analyze individual non-recombining domains or limit the number of species to those that are not undergoing recombination. While it is likely that these analyses will still reveal a signature of positive selection, this step is necessary to ensure that the signatures of selection and sites of positive selection are accurate.ย 

      (2)ย The choice of IFIT1 homologs chosen for study needs to be described in more detail. Many mammalian species encode IFIT1 and IFIT1B proteins, which have been shown to have different antiviral specificity, and the evolutionary relationship between IFIT1 and IFIT1B paralogs is complicated by recombination. As such, the assertion that the proteins studied in this manuscript are IFIT1 orthologs requires additional support than the percent identity plot shown in Figure 3B.ย 

      (3) Some of the results and discussion text could be more focused on the model of evolution-driven changes in IFIT1 specificity. In particular, the chimpanzee data are interesting, but it would appear that this protein has lost all antiviral function, rather than changing its antiviral specificity like some other examples in this paper. As such, the connection between the functional mapping of individual residues with the positive selection analysis is somewhat confusing. It would be more clear to discuss this as a natural loss of function of this IFIT1, which has occurred elsewhere repeatedly across the mammalian tree.ย 

      (4) In other places in the manuscript, the strength of the differences in antiviral specificity could be highlighted to a greater degree. Specifically, the text describes a number of interesting examples of differences in inhibition of VSV versus VEEV from Figure 3C and 3D, but it is difficult for a reader to assess this as most of the dots are unlabeled and the primary data are not uploaded. A few potential suggestions would be to have a table of each ortholog with % infection by VSV and % infection by VEEV. Another possibility would be to plot these data as an XY scatter plot. This would highlight any species that deviate from the expected linear relationship between the inhibition of these two viruses, which would provide a larger panel of interesting IFIT1 antiviral specificities than the smaller number of species shown in Figure 4.ย 

      We thank the reviewer for their fair assessment of our manuscript. As the reviewer requested, we performed GARD analysis on our alignments used for PAML, FUBAR, and MEME (New Supp Fig 1). By GARD, we found 1 or 2 predicted breakpoints in each clade. However, much of the sequence was after or between the predicted breakpoints. Therefore, we were able to reanalyze for sites undergoing positive selection in the large region of the sequence that do not span the breakpoints. We were able to validate almost all sites originally identified as undergoing positive selection still exhibit signatures of positive selection taking these breakpoints into account: primates (11/12), bats (14/16), ungulates (30/37), and carnivores (2/4). To further validate our positive selection analysis, we used Recombination Detection Program 4 (RDP4) to remove inferred recombinant sequences from the primate IFIT1 alignment and performed PAML, FUBAR, and MEME. Once again, the sites in our original anlaysis were largely validated by this method. Importantly, sites 170, 193, and 366 in primates, which are discussed in our manuscript, were found to be undergoing positive selection in 2 of the 3 analyses using alignments after the indicated breakpoint in GARD and after removal of recombinant sequences by RDP4. We have updated the text to acknowledge IFIT1/IFIT1B recombination more clearly and include the GARD analysis as well as PAML, FUBAR, and MEME reanalysis taking into account predicted breakpoints by GARD and RDP4. Furthermore, to increase evidence that the sequences used in this study for both computational and functional analysis are IFIT1 orthologs rather than IFIT1B, we have included a maximum likelihood tree after aligning coding sequences on the C-terminal end (corresponding to bases 907-1437 of IFIT1). In Daughtery et al. 2016 (PMID: 27240734) this strategy was used to distinguish between IFIT1 and IFITB. All sequences used in our study grouped with IFIT1 sequences (including many confirmed IFIT1 sequences used in Daughterty et al.) rather than IFIT1B sequences or IFIT3. This new data, including the GARD, RDP4, and maximum likelihood tree is included as a new Supplementary Figure 1.

      We also agree with the reviewer that it is possible that chimpanzee IFIT1 has lost antiviral function due to the residues 364 and 366 that differ from human IFIT1. We have updated the discussion sections to include the possibility that chimpanzee IFIT1 is an example of a natural loss of function that has occurred in other species over evolution as well as the potential consequences of this occurrence. Regarding highlighting the strength of differences in antiviral activity between IFIT1 orthologs, we have included several updates to strengthen the ability of the reader to assess these differences. First, we have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory) giving the reader a visual representation in the figure of top antiviral orthologs during our screen. We have also updated the figure legend to inform the reader of this information.

      Reviewer #3 (Public Review): ย 

      Summary:ย 

      This manuscript by McDougal et al, demonstrates species-specific activities of diverse IFIT1 orthologs and seeks to utilize evolutionary analysis to identify key amino acids under positive selection that contribute to the antiviral activity of this host factor. While the authors identify amino acid residues as important for the antiviral activity of some orthologs and propose a possible mechanism by which these residues may function, the significance or applicability of these findings to other orthologs is unclear. However, the subject matter is of interest to the field, and these findings could be significantly strengthened with additional data.

      Strengths:

      Assessment of multiple IFIT1 orthologs shows the wide variety of antiviral activity of IFIT1, and identification of residues outside of the known RNA binding pocket in the protein suggests additional novel mechanisms that may regulate IFIT1 activity.

      Weaknesses:

      Consideration of alternative hypotheses that might explain the variable and seemingly inconsistent antiviral activity of IFIT1 orthologs was not really considered. For example, studies show that IFIT1 activity may be regulated by interaction with other IFIT proteins but was not assessed in this study.

      Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs, or whether these are unique to human and chimpanzee IFIT1. Similarly, while the hypothesis that these residues impact IFIT1 activity in an allosteric manner is an attractive one, there is no data to support this. ย 

      We thank the reviewer for their fair assessment of our manuscript. To address the weaknesses that the reviewer has pointed out we have expanded the discussion to more directly address alternate hypotheses, such as the possibility of IFIT1 activity being regulated by interaction with other IFIT proteins. Furthermore, we expanded the discussion to include an alternate hypothesis for the role of residues 364 and 366 in primate IFIT1 besides allosteric regulation. In addition, we did not intend to claim or imply that residues 364/6 are the key drivers of antiviral activity for all IFITs tested. However, we speculate that within primates these residues may play a key role as these residues differ between chimpanzee IFIT1 (which lacks significant antiviral activity towards the viruses tested in this study) and human IFIT1 (which possesses significant antiviral activity). In addition, these residues seem to be generally conserved in primate species, apart from chimpanzee IFIT1. We have included changes to the text to more clearly indicate that we highlight the importance of these residues specifically for primate IFIT1, but not necessarily for all IFIT1 proteins in all clades.

      Reviewer #1 (Recommendations for the authors):ย 

      (1) The readers would benefit from a more detailed background on the concept and estimation of positive selection for the readers, including the M7/8 models in PAML.ย 

      We have included more information in the text to provide a better background for the concepts of positive selection and how PAML tests for this using M7 and M8 models.

      (2) Presentation of dataย 

      a) Figure 3C and 3D: is there a better way to present the infection data so the readers can tell the ranked antiviral activity of the species that suppress VEEV?ย 

      We have included a supplementary table that includes the infection data for each ortholog from the VEEV and VSV screen to allow for readers to evaluate ranked antiviral activity of the species that suppress these viruses. In addition, the silhouettes next to the dot plots indicate the top ranked hits in order of viral inhibition (with the top being the most inhibitory). We have updated the figure legend to inform the reader of this information as well.

      b) Figure 4C and 4D: consider putting the western blot in Supplementary Figure 1 underneath the infection data or with the heatmap so readers can compare it with the antiviral activity.ย 

      We have also included quantification of the western blots performed to evaluate IFIT1 expression during the experiments shown in Figure 4C and 4D in an updated Figure 4B. We have also included normalized expression values with the heatmap shown in an updated Figure 4G so the reader can evaluate potential impact of protein expression on antiviral activity for all infection experiments shown in figure 4.

      (3)ย Line 269-270: as a rationale for narrowing the species to human, black flying fox, and chimp IFIT1, human and black flying fox were chosen because they strongly inhibit VEEV, but pangolin wasn't included even though it had the strongest anti-VEEV activity?ย 

      The rationale for narrowing the species to human, black flying fox, and chimpanzee IFIT1 was related to the availability of biological tools, high quality genome/transcriptome sequencing databases, and other factors. Specifically human and chimp IFIT1 are closely related but have variable antiviral activities, making their comparison highly relevant. Bats are well established as reservoirs for diverse viruses, whereas the reservoir status of many other mammals is less well defined. Furthermore, purifying large amounts of high quality IFIT1 protein after bacterial expression was another limitation to functional studies. We have added this information into the manuscript text.

      (4) Figure 5A: to strengthen the claim that "species-specific antiviral activities of IFIT1s can be partly explained by RNA binding potential", it would be good to include one more positive and one more negative control. In other words, test the cap0 RNA binding activity of an IFIT1 ortholog that strongly inhibits VEEV and an ortholog that does not. It would also be good to discuss why chimp IFIT1 still shows dose-dependent RNA binding yet it is one of the weakest at inhibiting VEEV.ย 

      We appreciate the reviewer's suggestion to include more controls and expand the dataset. While we understand the potential value of expanding the dataset, we believe that human IFIT1 serves as a robust positive control and human IFIT1 R187 (RNA-binding deficient) serves as an established negative control. Future experiments with other purified IFITs from other species will indeed strengthen evidence linking IFIT1 species-specific activity and RNA-binding.

      Regarding chimpanzee IFIT1, we acknowledge there appears to be some dose-dependent Cap0 RNA-binding. However, the binding affinity is much weaker than that of human or black flying fox IFIT1. We speculate that during viral infection reduced binding affinity could impair the ability of chimpanzee IFIT1 to efficiently sequester viral RNA and inhibit viral translation. This reduction in binding affinity may, therefore, allow the cell to be overwhelmed by the exponential increase in viral RNA during replication resulting in an ineffective antiviral IFIT1. In the literature, a similar phenomenon is observed by Hyde et. al (PMID: 24482115). In this study, the authors test mouse Ifit1 Cap0 RNA binding by EMSA of the 5โ€™ UTR sequence of VEEV RNA containing an A or G at nucleotide position 3. EMSA shows binding of both the A3 and G3 Cap0 VEEV RNA sequences, however stronger Ifit1 binding is observed for A3 Cap0 RNA sequence. The consequences of the reduced Ifit1 binding of the G3 Cap0 VEEV RNA are observed in vitro by a substantial increase in viral titers produced from cells as well as an increase in protein produced in a luciferase-based translation assay. The authors also show in vivo relevance of this reduction of Ifit1 binding as WT B6 mice infected with VEEV containing the A3 UTR exhibited 100% survival, while WT B6 mice infected with VEEV containing the G3 UTR survived at a rate of only ~25%. Therefore, the literature supports that a decrease in Cap0 RNA binding by an IFIT protein (while still exhibiting Cap0 RNA binding) observed by EMSA can result in considerable alterations of viral infection both in vitro and in vivo.

      Minor:ย 

      (1) Line 82: "including 5' triphosphate (5'-ppp-RNA), or viral RNAs..." having a comma here will make the sentence clearer.ย 

      We have improved the clarity of this sentence. It now reads, โ€œIFIT1 binds uncapped 5โ€ฒtriphosphate RNA (5โ€ฒ-ppp-RNA) and capped but unmethylated RNA (Cap0, an m<sup>7</sup>G cap lacking 2โ€ฒ-O methylation).โ€

      (2)ย Line 100: "...similar mechanisms have been at least partially evolutionarily conserved in IFIT proteins to restrict viral infection by IFIT proteins".ย 

      We have updated the text to improve clarity by revising the sentence to โ€œVEEV TC-83 is sensitive to human IFIT1 and mouse Ifit1B, indicating at least partial conservation of antiviral function by IFIT proteins."

      (3) Line 109: "signatures of rapid evolution or positive selection" would put positive selection second because that is the more technical term that can benefit from the more layperson term (rapid evolution).ย 

      We have updated this sentence incorporating this suggestion. โ€œPositive selection, or rapid evolution, is denoted by a high ratio of nonsynonymous to synonymous substitutions (dN/dS >1).โ€

      (4)ย Lines 116-117: "However, this was only assessed in a few species" would benefit from a citation.ย 

      We have inserted the citation.

      (5)ย Line 127 heading: "IFIT1 is rapidly evolving in mammals" would be more accurate to say "in major clades of mammals".ย 

      We have updated the text to include this suggestion.

      (6)ย Line 165: "IFIT1 L193 mutants".ย 

      We have updated the text to rephrase this for clarity.

      (7)ย Line 170: two strains of VEEV were mentioned in the Intro, so it would be good to specify which strain of VEEV was used?

      We have updated the text to clarify the VEEV strain. In this study, all experiments were performed using the VEEV TC-83 strain.

      (8)ย Line 174: "Indeed, all mutants at position 193, whether hydrophobic or positively charged, inhibited VEEV similarly to the WT..." It should read "all hydrophobic and positively charged mutants inhibited VEEV similarly to the WT...".ย 

      We corrected as suggested.ย 

      (9)ย Line 204: what are "control cells"? Cells that are mock-infected, or cells without IFIT1?ย 

      We have updated the text to improve clarity. What we refer to as control cells, were cells expressing an empty vector control rather than an IFIT1.

      (10)ย Need to clarify n=2 and n=3 replicates throughout the manuscript. Does that refer to three independent experiments? Or an experiment with triplicate wells/samples?ย 

      We have updated the text to say โ€œindependent experimentsโ€ instead of โ€œbiological replicatesโ€ to prevent any confusion.ย  All n=2 or n=3 replicates denote independent experiments.

      (11)ย Line 254: "dominant antiviral effector against the related human parainfluenza virus type 5..."ย 

      We have updated the text to improve clarity.

      (12)ย Line 271: "The black flying fox (Pteropus alecto), is a model megabat species..." scientific name was italicized here but not elsewhere. Remove comma.

      We have updated the text accordingly.

      (13)ย Line 293: "...chimpanzee IFIT1 lacked these properties" but chimp IFIT1 can bind cap0 RNA, just at a lower level.ย 

      We have updated the text to acknowledge that chimpanzee IFIT1 can bind cap0 RNA, albeit at a lower level than human IFIT1.

      (14)ย Figure 6B: please fix the x-axis labels. They're very cramped.ย 

      We have updated the x-axis labels for figure 6B and figure 6D to improve clarity.

      (15)ย Line 609: "...trimmed and aligned"?ย 

      Our phrasing is to indicate that coding sequences were aligned, and gaps were removed to reduce the chance of false positive signal by underrepresented codons such as gaps or short insertions. We have removed โ€œtrimmedโ€ from the text and changed the text to say โ€œaligned sequencesโ€ to increase clarity.

      Reviewer #2 (Recommendations for the authors):ย 

      (1) Numbers less than 10 should be spelled out throughout the manuscript (e.g. line 138).ย 

      We have updated the text to reflect the request.

      (2) Line 165: "expression of IFIT1 193 mutants" should be rephrased.ย 

      We have updated the text to rephrase this sentence for clarity.

      (3) A supplemental table or file should be included that contains the accession number and species names of sequences used for evolutionary analyses and for functional testing. In addition, the alignments that were used for positive selection can be included. ย 

      We have included a supplemental file containing accession numbers, species names for evolutionary analysis and functional studies. In addition, this table includes the infection data for each IFIT1 homolog for the screen performed in figure 3.

      (4) The discussion of potential functions of the C-terminus of IFIT1 should include possible interactions with other proteins. In particular, the C-terminus of IFIT1 has been shown to interact with IFIT3 in a way that modulates its activity (PMID: 29525521). Although residues 362-366 were not shown in that paper to interact with a fragment of IFIT3, it is possible that these residues may be important for interaction with full-length IFIT3 or some other IFIT1 binding partner.ย 

      We thank the reviewer for their suggestion. We have expanded the discussion to explore the possibility that residues 364 and 366 of IFIT1 may be involved in IFIT1-IFIT3 interactions and consequently Cap0 RNA-binding and antiviral activity.

      (5) The quantification of the EMSAs should be described in more detail. In particular, from looking at the images shown in Figure 5A, it would appear that human and chimpanzee IFIT1 show similar degrees of probe shift, while the human R187H panel shows no shifting at all. However, the quantification shows chimpanzee IFIT1 as being statistically indistinguishable from human R187H. Additional information on how bands were quantified and whether they were normalized to unshifted RNA would be helpful in attempting to resolve this visual discordance.ย 

      EMSAs were quantified by determining Adj. Vol. Intensity in ImageLab (BioRad), which subtracts background signal, after imaging at the same exposure and SYBR Gold staining time. To determine Adj. Vol. Intensity, we drew a box (same size for each gel and lane for each replicate) for each lane above the free probe. These values were not normalized to unshifted RNA, however equal RNA was loaded. While the ANOVA shows no significant difference, between human R187H and chimpanzee IFIT1 band shift intensity, this is potentially due to the between group variance in the ANOVA. The increase in the AUC value for chimpanzee IFIT1 is 36.4% higher than R187H.

      The AUC of Adj. Vol. Intensity of human IFIT1 band shift is roughly 2-fold more than that of chimpanzee IFIT1. We believe this matches with the visual representation as well, as human IFIT1 has a darker โ€œupperโ€ band in the shift, as well as a clear dark โ€œlowerโ€ band that is not well defined in the chimpanzee shift. Furthermore, the upper band of the chimpanzee IFIT1 shift appears to be as intense in the 400nM as the upper band in the 240nM human IFIT1 lane, without taking into account the lower band seen for human IFIT1 as well. We included this quantification as kD was unable to be calculated due to no clear probe disappearance and we do not intend for this quantification to act as a substitute for binding affinity calculations, rather to aid the reader in data interpretation.

      Reviewer #3 (Recommendations for the authors):ย 

      (1) IFIT1 has been demonstrated to function in conjunction with other IFIT proteins, do you think the absence of antiviral activity is due to isolated expression of IFIT1 without these cofactors, and therefore might explain why there was little overlap observed in orthologs that inhibited the viruses tested (Figure 3, lines 209-210).ย 

      We do not believe that isolated expression of IFIT1 without cofactors (such as orthologous IFIT proteins) would fully explain the disparities in antiviral activity as many IFIT1s that expressed inhibited either VSV or VEEV in our screen. However, we acknowledge that the expression of IFIT1 alone does create a limitation in our study as IFIT1 antiviral activity and RNA-binding can be modulated by interactions with other IFIT proteins. Therefore, we do believe that it is possible that co-expression of IFIT1 with other IFITs from a given species might potentially enhance antiviral activity. Future studies may shed light on this.

      (2)ย Figure 5 - Calculating the Kd for each protein would be more informative. How does the binding affinity of these IFIT1 proteins compare to that which has previously been reported?ย 

      We are unable to accurately determine kD as there is not substantial diminished signal of the free probe. Therefore, we are only able to compare IFIT1 protein binding between species without accurate mathematical calculation of binding affinity. Our result does appear similar to that of mouse Ifit1 binding to VEEV RNA (PMID: 24482115), in which the authors also do not calculate a kD for their RNA EMSA.

      (3) Mutants 364 and 366 may not have direct contact with RNA, but RNA EMSA data presented suggest that the binding affinity may be different (though this is hard to conclude without Kd data). Additional biochemical data with these mutants might provide more insight here.ย 

      We agree that further studies using 364 and 366 double mutant human and chimpanzee protein in EMSAs would provide additional biochemical data and provide insight into the role of these residues in direct RNA binding. We acknowledge this is a limitation of our study as we provide only genetic data demonstrating the importance of these residues.

      (4)ย Given that there appears to be very little overlap observed in orthologs that inhibited the viruses tested, it's possible that other amino acids may be key drivers of antiviral activity in these other orthologs. Thus, it's difficult to conclude whether the findings that residues 362/4/6 are important for IFIT1 activity can be broadly applied to other orthologs. A more systematic assessment of the role of these mutations across multiple diverse orthologs would provide more insight here. Do other antiviral proteins show this trend (ie exhibit little overlap in orthologs that inhibit these viruses). What do you think might be driving this?ย 

      We agree that other residues outside of 364 and 366 may be key drivers of antiviral activity across the IFTI1 orthologs tested. We do not hypothesize that this will broadly apply across IFIT1 from diverse clades of mammals as overall amino acid identity can differ by over 30%. However, based on the chimpanzee and human IFIT1 data, as well as sequence alignment within primates specifically, we believe these residues may be key for primate (but not necessarily other clades of mammals) IFIT1 antiviral activity.

      Regarding if other antiviral proteins show little overlap in orthologs that inhibit a given virus, to our knowledge such a functional study with this large and divergent dataset of orthologs has not been performed. However, there are many examples of restriction factors exhibiting speciesspecific antiviral activity when ortholog screens have been performed. For example, HIV was reported to be suppressed by MX2 orthologs from human, rhesus macaque, and African green monkey, but not sheep or dog MX2 (PMID: 24760893). In addition, foamy virus was inhibited by the human and rhesus macaque orthologs of PHF11, but not the mouse and feline orthologs (PMID: 32678836). Furthermore, studies from our lab have shown variability in RTP4 ortholog antiviral activity inhibition towards viruses much as hepatitis C virus (HCV), West Nile virus (WNV), and Zika virus (ZIKV) (PMID: 33113352).

    1. Reviewer #4 (Public review):

      Summary:

      Several behavioral experiments and one TMS experiment were performed to examine adaptation to room reverberation for speech intelligibility in noise. This is an important topic that has been extensively studied by several groups over the years. And the study is unique in that it examines one candidate brain area, dlPFC, potentially involved in this learning, and finds that disrupting this area by TMS results in a reduction in the learning. The behavioral conditions are in many ways similar to previous studies. However, they find results that do not match previous results (e.g., performance in anechoic condition is worse than in reverberation), making it difficult to assess the validity of the methods used. One unique aspect of the behavioral experiments is that Ambisonics was used to simulate the spaces, while headphone simulation was mostly used previously. The main behavioral experiment was performed by interleaving 3 different rooms and measuring speech intelligibility as a function of the number of words preceding the target in a given room on a given trial. The findings are that performance improves on the time scale of seconds (as the number of words preceding the target increases), but also on a much larger time scale of tens to hundreds of seconds (corresponding to multiple trials), while for some listeners it is degraded for the first couple of trials. The study also finds that the performance is best in the room that matches the T60 most commonly observed in everyday environments. These are potentially interesting results. However, there are issues with the design of the study and analysis methods that make it difficult to verify the conclusions based on the data.

      Strengths:

      (1) Analysis of the adaptation to reverberation on multiple time scales, for multiple reverberant and anechoic environments, and also considering contextual effects of one environment interleaved with the other two environments.

      (2) TMS experiment showing reduction of some of the learning effects by temporarily disabling the dlPFC.

      Weaknesses:

      While the study examines the adaptation for different carrier lengths, it keeps multiple characteristics (mainly talker voice and location) fixed in addition to reverberation. Therefore, it is possible that the subjects adapt to other aspects of the stimuli, not just to reverberation. A condition in which only reverberation would switch for the target would allow the authors to separate these confounding alternatives. Now, the authors try to address the concerns by indirect evidence/analyses. However, the evidence provided does not appear sufficient.

      The authors use terms that are either not defined or that seem to be defined incorrectly. The main issue then is the results, which are based on analysis of what the authors call d', Hit Rate, and Final Hit rate. First of all, they randomly switch between these measures. Second, it's not clear how they define them, given that their responses are either 4-alternative or 8-alternative forced choice. d', Hit Rate, and False Alarm Rate are defined in Signal detection theory for the detection of the presence of a target. It can be easily extended to a 2-alternative forced choice. But how does one define a Hit, and, in particular, a False Alarm, in a 4/8-alternative? The authors do not state how they did it, and without that, the computation of d' based on HR and FAR is dubious. Also, what the authors call Hit Rate, is presumably the percent correct performance (PCC), but even that is not clear. Then they use FHR and act as if this was the asymptotic value of their HR, even though in many conditions their learning has not ended, and randomly define a variable of +-10 from FHR, which must produce different results depending on whether the asymptote was reached or not. Other examples of usage of strange usage of terms: they talk about "global likelihood learning" (L426) without a definition or a reference, or about "cumulative hit rate" (L1738), where it is not clear to me what "cumulative" means there.

      There are not enough acoustic details about the stimuli. The authors find that reverberant performance is overall better than anechoic in 2 rooms. This goes contrary to previous results. And the authors do not provide enough acoustic details to establish that this is not an artefact of how the stimuli were normalized (e.g., what were the total signal and noise levels at the two ears in the anechoic and reverberant conditions?).

      There are some concerns about the use of statistics. For example, the authors perform two-way ANOVA (L724-728) in which one factor is room, but that factor does not have the same 3 levels across the two levels of the other factor. Also, in some comparisons, they randomly select 11 out of 22 subjects even though appropriate test correct for such imbalances without adding additional randomness of whether the 11 selected subjects happened to be the good or the bad ones.

      Details of the experiments are not sufficiently described in the methods (L194-205) to be able to follow what was done. It should be stated that 1 main experiment was performed using 3 rooms, and that 3 follow-ups were done on a new set of subjects, each with the room swapped.

    1. Author response:

      eLife Assessment:

      This paper performs a valuable critical reassessment of anatomical and functional data, proposing a reclassification of the mouse visual cortex in which almost all the higher visual areas are consolidated into a single area V2. However, the evidence supporting this unification is incomplete, as the key experimental observations that the model attempts to reproduce do not accurately reflect the literature . This study will likely be of interest to neuroscientists focused on the mouse visual cortex and the evolution of cortical organization.

      We do not agree or understand which 'key experimental observations' that the model attempts to reproduce do not accurately reflect the literature. The model reproduces a complete map of the visual field, with overlap in certain regions. When reversals are used to delineate areas, as is the current custom, multiple higher order areas are generated, and each area has a biased and overlapping visual field coverage. These are the simple outputs of the model, and they are consistent with the published literature, including recent publications such as Garrett et al. 2014 and Zhuang et al. 2017, a paper published in this journal. The area boundaries produced by the model are not identical to area boundaries in the literature, because the model is a simplification.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1)ย Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      We agree the revised manuscript would benefit from a clear discussion of updated rules of area delineation in the mouse. In brief, we argue that retinotopy alone should not be used to delineate area boundaries in mice, or any other species. Although there is some evidence for functional property, architecture, and connectivity changes across mouse HVAs, area boundaries continue to be defined primarily, and sometimes solely (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017), based on retinotopy. We acknowledge that earlier work (Wang and Burkhalter, 2007; Wang et al., 2011) did consider cytoarchitecture and connectivity alongside retinotopy, but more recent work has shifted to a focus on retinotopy as indicated by the currently accepted criterion for area delineation. ย 

      As reviewer #2 points out, the present criteria for mouse visual area delineation can be found in the Methods section of: [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014)].

      Criterion 1: Each area must contain the same visual field sign at all locations within the area.

      Criterion 2: Each visual area cannot have a redundant representation of visual space.

      Criterion 3: Adjacent areas of the same visual field sign must have a redundant representation.

      Criterion 4: An area's location must be consistently identifiable across experiments.

      As discussed in the manuscript, recent evidence in higher order visual cortex of tree shrews and rats led us to question the universality of these criteria across species. Specifically, tree shrew V2, macaque V2, and marmoset DM, exhibit reversals in visual field-sign in what are defined as single visual areas. This suggests that criterion 1 should be updated. It also suggests that Criterion 2 and 3 should be updated since visual field sign reversals often co-occur with retinotopic redundancies, since reversing course in the direction of progression along the visual field can easily lead to coverage of visual field regions already traveled. ย 

      More broadly, we argue that topography is just one of several criteria that should be considered in area delineation. We understand that few visual areas in any species meet all criteria, but we emphasize that topography cannot consistently be the sole satisfied criterion โ€“ as it currently appears to be for many mouse HVAs. Inspired by a recent perspective on cortical area delineation (Petersen et al., 2024), we suggest the following rules, that will be worked into the revised version of the manuscript. Topography is a criterion, but it comes after considerations of function, architectonics and connectivity.

      (1)ย Functionโ€”Cortical areas differ from neighboring areas in their functional properties ย 

      (2) Architectonicsโ€”Cortical areas often exhibit distinctions from neighboring areas in multiple cyto- and myeloarchitectonic markers

      (3) Connectivityโ€”Cortical areas are characterized by a specific set of connectional inputs and outputs from and to other areas

      (4) Topographyโ€”Cortical areas often exhibit a distinct topography that balances maximal coverage of the sensory field with minimal redundancy of coverage within an area.

      As we discuss in the manuscript, although there are functional, architectonic, and connectivity differences across mouse HVAs, they typically vary smoothly across multiple areas โ€“ such that neighboring areas share the same properties and there are no sharp borders. For instance, sharp borders in cytoarchitecture are generally lacking in the mouse HVAs. A notable exceptions to this is the clear and sharp change in m2AChR expression that occurs between LM and AL (Wang et al., 2011).ย 

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      Thank you for the correction. We will revise the text to discuss (Glickfeld et al., 2013), as it remains some of the strongest evidence in favor of retinotopy-independent functional specialization of mouse HVAs. However, one caveat of this study is the size of the V1 injection that is the source of axons studied in the HVAs. As apparent in Figure 1B, the large injection covers nearly a quarter of V1. It is worth nothing that (Han et al., 2018) found, using single-cell reconstructions and MAPseq, that the majority of V1 neurons project to multiple nearby HVA targets. In this experiment the tracing does not suffer from the problem of spreading over V1โ€™s retinotopic map, and suggests that, presumably retinotopically matched, locations in each area receive shared inputs from the V1 population rather than a distinct but spatially interspersed subset. In fact, the authors conclude โ€œInterestingly, the location of the cell body within V1 was predictive of projection target for some recipient areas (Extended Data Fig. 8). Given the retinotopic organization of V1, this suggests that visual information from different parts of visual field may be preferentially distributed toย  specific target areas, which is consistent with recent findings (Zhuang et al., 2017)โ€. Given an injection covering a large portion of the retinotopic map, and the fact that feed-forward projections from V1 to HVAs carry coarse retinotopy - it is difficult to prove that functional specializations noted in the HVA axons are retinotopyindependent. This would require measurement of receptive field location in the axonal boutons, which the authors did not perform (possibly because the SNR of calcium indicators prevented such measurements at the time). ย 

      Another option would be to show that adjacent neurons in V1, that project to far-apart HVAs, exhibit distinct functional properties on par with differences exhibited by neurons in very different parts of V1 due to retinotopy. In other words, the functional specificity of V1 inputs to HVAs at retinotopically identical locations is of the same order as those that might be gained by retinotopic biases. To our knowledge, such a study has not been conducted, so we have decided to measure the data in collaboration with the Allen Institute. As part of the Allen Instituteโ€™s pioneering OpenScope project, we will make careful two-photon and electrophysiology measurements of functional properties, including receptive field location, SF, and TF in different parts of the V1 retinotopic map. Pairing this data with existing Allen Institute datasets on functional properties of neurons in the HVAs will allow us to rule in, or rule-out, our hypotheses regarding retinotopy as the source of functional specialization in mouse HVAs. We will update the discussion in the revised manuscript to better reflect the need for additional evidence to support or refute our proposal.

      Meier AM et al., bioRxiv 2025 (Meier et al., 2025) was published after our submission, but we are thankful to the reviewers for guiding our attention to this timely paper. Given the recent findings on the influence of locomotion on rodent and primate visual cortex, it is very exciting to see clearly specialized circuits for processing self-generated visual motion in V1. However, it is difficult to rule out the role of retinotopy as the HVA areas (LM, AL, RL) participating in the M2+ network less responsive to self-generated visual motion exhibit a bias for the medial portion of the visual field and the HVA area (PM) involved in the M2- network responsive to self-generated visual motion exhibit a bias for the lateral (or peripheral) parts of the visual field. For instance, a peripheral bias in area PM has been shown using retrograde tracing as in Figure 6 of (Morimoto et al., 2021), single-cell anterograde tracingย  as in Extended Data Figure 8 of (Han et al., 2018), and functional imaging studies (Zhuang et al., 2017). Recent findings in the marmoset also point to visual circuits in the peripheral, but not central, visual field being significantly modulated by selfgenerated movements (Rowley et al., 2024).ย 

      However, a visual field bias in area PM that selectively receive M2- inputs is at odds with the clear presence of modular M2+/M2- patches across the entire map of V1 (Ji et al., 2015).ย  One possibility supported by existing data is that neurons in M2- patches, as well as those in M2+ patches, in the central representation of V1 make fewer or significantly weaker connections with area PM compared to areas LM, AL and RL. Evidence to the contrary would support retinotopy-independent and functionally specialized inputs from V1 to HVAs.

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

      We understand that such a proposal would combine a subset of areas with matched field sign (LM, P, PM, and RL) would be less extreme and received better by the community. This would create a V2 with a smooth map without reversals or significant redundant retinotopic coverage. However, the intuition we have built from our modeling studies suggest that both these areas, and the other smaller areas with negative field sign (AL, AM, LI), are a byproduct of a complex single map of the visual field that exhibits reversals as it contorts around the triangular and tear-shaped boundaries of V1. In other words, we believe the redundant coverage and field-sign changes/reversals are a byproduct of a single secondary visual field in V2 constrained by the cortical dimensions of V1. That being said, we understand that area delineations are in part based on a consensus by the community. Therefore we will continue to discuss our proposal with community members, and we will incorporate new evidence supporting or refuting our hypothesis, before we submit our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      We would not claim to have written the final chapter of the saga. Our goal was to add an important piece of new evidence to the discussion of area delineations across species. We believe this new evidence supports our unification hypothesis. ย We also believe that there are several missing pieces of data that could support or refute our hypothesis. We have begun a collaboration to collect some of this data. ย 

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      This brings up a nuanced issue regarding visual field coverage. Wang & Burkhalter, 2007 Figure 6 shows the receptive field of sample neurons in area LM that cover the full range between 0 and 90 degrees of azimuth, and -40 to 80 degree of elevation โ€“ which essentially matches the visual field coverage in V1. However, we do not know whether these neurons are representative of most neurons in area LM. In other words, while these single-cell recordings along selected contours in cortex show the span of the visual field coverage, they may not be able to capture crucial information about its shape, missing regions of the visual field or potential bias. To mitigate this, visual field maps measured with electrophysiology are commonly produced by even sampling across the two dimensions of the visual area, either by moving a single electrode along a grid-pattern (e.g. (Manger et al., 2002)), or using a grid-liked multi-electrode probe (e.g. (Yu et al., 2020)). This was not carried out either in Wang & Burkhalter 2007 or Wang et al. 2011.ย  Even sampling of cortical space is time consuming and difficult with electrophysiology, but efficient with functional imaging. Therefore, despite the likely under-estimation of visual field coverage, imaging techniques are valuable in that they can efficiently exhibit not only the span of the visual field of a cortical region, but also its shape and bias. ย 

      Multiple functional imaging studies that simultaneously measure visual field coverage in V1 and HVAs report a bias in the coverage of HVAs, relative to that in V1 (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017). While functional imaging will likely underestimate receptive fields compared to electrophysiology, the consistent observation of an orderly bias for distinct parts of the visual field across the HVAs suggests that at least some of the HVAs do not have full and uniform coverage of the visual field comparable to that in V1. For instance, (Garrett et al., 2014) show that the total coverage in HVAs, when compared to V1, is typically less than half (Figure 6D) and often irregularly shaped.

      Careful measurements of single-cell receptive fields, using mesoscopic two-photon imaging across the HVAs would settle this question. As reviewer #1 points out, this is technically feasible, though no dataset of this kind exists to our knowledge.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      We wrote that โ€œFuture work showed boundaries in labeling of histological markers such as SMI-32 and m2ChR labeling, but such changes mostly delineated area LM/AL (Wang et al., 2011) and seemed to be correlated with the representation of the lower visual field.โ€ The latter statement regarding the representation of the lower visual field is directly referencing the data in Figure 1 of (Wang et al., 2011), which is titled โ€œFigure 1: LM/AL border identified by the transition of m2AChR expression coincides with receptive field recordings from lower visual field.โ€ Similar to the Wang et al., we were simply referring to the fact that the border of area LM/AL co-exhibits a change in m2AChR expression as well as lower-visual field representation. ย 

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      Thanks for this correction. By โ€œanatomical boundaries of higher visual cortexโ€, we meant the cortical boundary between V1 and higher order visual areas on one end, and the outer edge of the envelope that defines the functional boundaries of the HVAs in cortical space (Zhuang et al., 2017). The reviewer is correct that we should have referred to these as functional boundaries. The word โ€˜anatomicalโ€™ was meant to refer to cortical space, rather than visual field space.

      More generally though, there is no disagreement between the partitioning of visual cortex in Figure 1 and 2. Rather, the portioning in Figure 1 is directly taken from Zhuang et al., (2017) whereas those in Figure 2 are produced by mathematical model simulation. As such, one would not expect identical areal boundaries between Figure 2 and Figure 1. What we aimed to communicate with our modeling results, is that a single area can exhibit multiple visual field reversals and retinotopic redundancies if it is constrained to fit around V1 and cover a visual field approximately matched to the visual field coverage in V1. We defined this area explicitly as a single area with a single visual field (boundaries shown in Figure 2A). Soย  the point of our simulation is to show that even an explicitly defined single area can appear as multiple areas if it is constrained by the shape of mouse V1, and if visual field reversals are used to indicate areal boundaries. As in most models, different initial conditions and parameters produce a complex visual field which will appear as multiple HVAs when delineated by areal boundaries. What is consistent however, is the existence of complex single visual field that appears as multiple HVAs with partially overlapping coverage.

      Similarly, we would not expect a simple model to exactly reproduce the multi-color tracer injections in Wang and Burkhalter (2007). However, we find it quite compelling that the model can produce multiple groups of multi-colored axonal projections beyond V1 that can appear as multiple areas each with their own map of the visual field using current criteria, when the model is explicitly designed to map a single visual field. We will explain the results of the model, and their implications, better in the revised manuscript.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      This is a good point. We do not expect that expanding the coverage of V1 will change the results of the model significantly. However, for the revised manuscript, we will update V1 coverage to be accurate, repeat our simulations, and report the results. ย 

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      The reviewer is correct. In the revised manuscript, we will be careful to distinguish bias in visual field coverage across areas from presence or lack of visual field overlap. ย 

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

      This is an interesting suggestion, and careful visual topography data from rabbits and other lateral eyed animals would help to evaluate it. For what itโ€™s worth, tree shrews are lateral eyed animals with only 50 degrees of binocular visual field and also show V2 subregions.

      Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

      Thank you for this correction, we admit we should have chosen our words more carefully. In the revised manuscript, we will emphasize that higher order visual areas in the mouse do have some overlap in their representations but also exhibit bias in their coverage. This is consistent with our proposal and in fact our model simulations in Figure 2E also show overlapping representations along with differential bias in coverage. However, we also note Figure 6 of Garret et al. 2014 provides several pieces of evidence in support of our proposal that higher order areas are sub-regions of a single area V2. Specifically, the visual field coverage of each area is significantly less than that in V1 (Garret et al. 2014, Figure 6D). While the imaging methods used in Garret et al. likely under-estimate receptive fields, one would assume they would similarly impact measurements of coverage in V1 and HVAs. Secondly, each area exhibits a bias towards a different part of the visual field (Figure 6C and E), that this bias is distinct for different areas but proceeds in a retinotopic manner around V1 - with adjacent areas exhibiting biases for nearby regions of the visual field (Figure 6E). Thus, the biases in the visual field coverage across HVAs appear to be related and not independent of each other. As we show in our modeling and in Figure 2, such orderly and inter-related biases can be created from a single visual field constrained to share a border with mouse V1. ย ย 

      With regards to the existing definition of a single area: we did not ignore the requirement that single areas cannot have redundant representations of the visual field. Rather, we believe that this requirement should be relaxed considering new evidence collected from other species, where multiple visual field reversals exist within the same visual area. We understand this issue is nuanced and was not made clear in the original submission. ย 

      In the revised manuscript, we will clarify that visual field reversals often exhibit redundant retinotopic representation on either side of the reversal. In the revised manuscript we will clarify that our argument that multiple reversals can exist within a single visual area in the mouse, is an argument that some retinotopic redundancy can exist with single visual areas. Such a re-classification would align how we define visual areas in mice with existing classification in tree shrews, ferrets, cats, and primates โ€“ all of whom have secondary visual areas with complex retinotopic maps exhibiting multiple reversals and redundant retinotopic coverage.

    1. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Parise presents another instantiation of the Multisensory Correlation Detector model that can now accept stimulus-level inputs. This is a valuable development as it removes researcher involvement in the characterization/labeling of features and allows analysis of complex stimuli with a high degree of nuance that was previously unconsidered (i.e., spatial/spectral distributions across time). The author demonstrates the power of the model by fitting data from dozens of previous experiments, including multiple species, tasks, behavioral modalities, and pharmacological interventions.

      Thanks for the kind words!

      Strengths:

      One of the model's biggest strengths, in my opinion, is its ability to extract complex spatiotemporal co-relationships from multisensory stimuli. These relationships have typically been manually computed or assigned based on stimulus condition and often distilled to a single dimension or even a single number (e.g., "-50 ms asynchrony"). Thus, many models of multisensory integration depend heavily on human preprocessing of stimuli, and these models miss out on complex dynamics of stimuli; the lead modality distribution apparent in Figures 3b and c is provocative. I can imagine the model revealing interesting characteristics of the facial distribution of correlation during continuous audiovisual speech that have up to this point been largely described as "present" and almost solely focused on the lip area.

      Another aspect that makes the MCD stand out among other models is the biological inspiration and generalizability across domains. The model was developed to describe a separate process - motion perception - and in a much simpler organism - Drosophila. It could then describe a very basic neural computation that has been conserved across phylogeny (which is further demonstrated in the ability to predict rat, primate, and human data) and brain area. This aspect makes the model likely able to account for much more than what has already been demonstrated with only a few tweaks akin to the modifications described in this and previous articles from Parise.

      What allows this potential is that, as Parise and colleagues have demonstrated in those papers since our (re)introduction of the model in 2016, the MCD model is modular - both in its ability to interface with different inputs/outputs and its ability to chain MCD units in a way that can analyze spatial, spectral, or any other arbitrary dimension of a stimulus. This fact leaves wide open the possibilities for types of data, stimuli, and tasks a simplistic, neutrally inspired model can account for.

      And so it's unsurprising (but impressive!) that Parise has demonstrated the model's ability here to account for such a wide range of empirical data from numerous tasks (synchrony/temporal order judgement, localization, detection, etc.) and behavior types (manual/saccade responses, gaze, etc.) using only the stimulus and a few free parameters. This ability is another of the model's main strengths that I think deserves some emphasis: it represents a kind of validation of those experiments, especially in the context of cross-experiment predictions (but see some criticism of that below).

      Finally, what is perhaps most impressive to me is that the MCD (and the accompanying decision model) does all this with very few (sometimes zero) free parameters. This highlights the utility of the model and the plausibility of its underlying architecture, but also helps to prevent extreme overfitting if fit correctly (but see a related concern below).

      We sincerely thank the reviewer for their thoughtful and generous comments. We are especially pleased that the core strengths of the modelโ€”its stimulus-computable architecture, biological grounding, modularity, and cross-domain applicabilityโ€”were clearly recognized. As the reviewer rightly notes, removing researcher-defined abstractions and working directly from naturalistic stimuli opens the door to uncovering previously overlooked dynamics in complex multisensory signals, such as the spatial and temporal richness of audiovisual speech.

      We also appreciate the recognition of the modelโ€™s origins in a simple organism and its generalization across species and behaviors. This phylogenetic continuity reinforces our view that the MCD captures a fundamental computation with wide-ranging implications. Finally, we are grateful for the reviewerโ€™s emphasis on the modelโ€™s predictive power across tasks and datasets with few or no free parametersโ€”a property we see as key to both its parsimony and explanatory utility.

      We have highlighted these points more explicitly in the revised manuscript, and we thank the reviewer for their generous and insightful endorsement of the work.

      Weaknesses:

      There is an insufficient level of detail in the methods about model fitting. As a result, it's unclear what data the models were fitted and validated on. Were models fit individually or on average group data? Each condition separately? Is the model predictive of unseen data? Was the model cross-validated? Relatedly, the manuscript mentions a randomization test, but the shuffled data produces model responses that are still highly correlated to behavior despite shuffling. Could it be that any stimulus that varies in AV onset asynchrony can produce a psychometric curve that matches any other task with asynchrony judgements baked into the task? Does this mean all SJ or TOJ tasks produce correlated psychometric curves? Or more generally, is Pearson's correlation insensitive to subtle changes here, considering psychometric curves are typically sigmoidal? Curves can be non-overlapping and still highly correlated if one is, for example, scaled differently. Would an error term such as mean-squared or root mean-squared error be more sensitive to subtle changes in psychometric curves? Alternatively, perhaps if the models aren't cross-validated, the high correlation values are due to overfitting?

      The reviewer is right: the current version of the manuscript only provides limited information about parameter fitting. In the revised version of the manuscript, we included a parameter estimation and generalizability section that includes all information requested by the reviewer.

      To test whether using the MSE instead of Pearson correlation led to a similar estimated set of parameter values, we repeated the fitting using the MSE. The parameter estimated with this method (TauV, TauA, TauBim) closely followed those estimated using Pearson correlation (TauV, TauA, TauBim). Given the similarity of these results, we have chosen not to include further figures, however this analysis is now included in the new section (pages 23-24).

      Regarding the permutation test, it is expected that different stimuli produce analogous psychometric functions: after all, all studies relied on stimuli containing identical manipulation of lags. As a result, MCD population responses tend to be similar across experiments. Therefore, it is not a surprise that the permuted distribution of MCD-data correlation in Supplementary Figure 1K has a mean as high as 0.97. However, what is important is to demonstrate that the non-permuted dataset has an even higher goodness of fit. Supplementary Figure 1K demonstrates that none of the permuted stimuli could outperform the non-permuted dataset; the mean of the non-permuted distribution is 4.7 (standard deviations) above the mean of the already highย  permuted distribution.

      We believe the new section, along with the present response, fully addresses the legitimate concerns of the reviewer.

      While the model boasts incredible versatility across tasks and stimulus configurations, fitting behavioral data well doesn't mean we've captured the underlying neural processes, and thus, we need to be careful when interpreting results. For example, the model produces temporal parameters fitting rat behavior that are 4x faster than when fitting human data. This difference in slope and a difference at the tails were interpreted as differences in perceptual sensitivity related to general processing speeds of the rat, presumably related to brain/body size differences. While rats no doubt have these differences in neural processing speed/integration windows, it seems reasonable that a lot of the differences in human and rat psychometric functions could be explained by the (over)training and motivation of rats to perform on every trial for a reward - increasing attention/sensitivity (slope) - and a tendency to make mistakes (compression evident at the tails). Was there an attempt to fit these data with a lapse parameter built into the decisional model as was done in Equation 21? Likewise, the fitted parameters for the pharmacological manipulations during the SJ task indicated differences in the decisional (but not the perceptual) process and the article makes the claim that "all pharmacologically-induced changes in audiovisual time perception" can be attributed to decisional processes "with no need to postulate changes in low-level temporal processing." However, those papers discuss actual sensory effects of pharmacological manipulation, with one specifically reporting changes to response timing. Moreover, and again contrary to the conclusions drawn from model fits to those data, both papers also report a change in psychometric slope/JND in the TOJ task after pharmacological manipulation, which would presumably be reflected in changes to the perceptual (but not the decisional) parameters.

      Fitting or predicting behaviour does not in itself demonstrate that a model captures the underlying neural computationsโ€”though it may offer valuable constraints and insights. In line with this, we were careful not to extrapolate the implications of our simulations to specific neural mechanisms.

      Temporal sensitivity is, by definition, a behavioural metric, andโ€”as the reviewer correctly notesโ€”its estimation may reflect a range of contributing factors beyond low-level sensory processing, including attention, motivation, and lapse rates (i.e., stimulus-independent errors). In Equation 21, we introduced a lapse parameter specifically to account for such effects in the context of monkey eye-tracking data. For the rat datasets, however, the inclusion of a lapse term was not required to achieve a close fit to the psychometric data (ฯ = 0.981). While it is likely that adding a lapse component would yield a marginally better fit, the absence of single-trial data prevents us from applying model comparison criteria such as AIC or BIC to justify the additional parameter. In light of this, and to avoid unnecessary model complexity, we opted not to include a lapse term in the rat simulations.

      With respect to the pharmacological manipulation data, we acknowledge the reviewerโ€™s point that observed changes in slope and bias could plausibly arise from alterations at either the sensory or decisional levelโ€”or both. In our model, low-level sensory processing is instantiated by the MCD architecture, which outputs the MCDcorr and MCDlag signals that are then scaled and integrated during decision-making. Importantly, this scaling operation influences the slope of the resulting psychometric functions, such that changes in slope can arise even in the absence of any change to the MCDโ€™s temporal filters. In our simulations, the temporal constants of the MCD units were fixed to the values estimated from the non-pharmacological condition (see parameter estimation section above), and only the decision-related parameters were allowed to vary. From this modelling perspective, the behavioural effects observed in the pharmacological datasets can be explained entirely by changes at the decisional level. However, we do not claim that such an explanation excludes the possibility of genuine sensory-level changes. Rather, we assert that our model can account for the observed data without requiring modifications to early temporal tuning.

      To rigorously distinguish sensory from decisional effects, future experiments will need to employ stimuli with richer temporal structureโ€”e.g., temporally modulated sequences of clicks and flashes that vary in frequency, phase, rhythm, or regularity (see Fujisaki & Nishida, 2007; Denison et al., 2012; Parise & Ernst, 2016, 2025; Locke & Landy, 2017; Nidiffer et al., 2018). Such stimuli engage the MCD in a more stimulus-dependent manner, enabling a clearer separation between early sensory encoding and later decision-making processes. Unfortunately, the current rat datasetsโ€”based exclusively on single click-flash pairingsโ€”lack the complexity needed for such disambiguation. As a result, while our simulations suggest that the observed pharmacologically induced effects can be attributed to changes in decision-level parameters, they do not rule out concurrent sensory-level changes.

      In summary, our results indicate that changes in the temporal tuning of MCD units are not necessary to reproduce the observed pharmacological effects on audiovisual timing behaviour. However, we do not assert that such changes are absent or unnecessary in principle. Disentangling sensory and decisional contributions will ultimately require richer datasets and experimental paradigms designed specifically for this purpose. We have now modified the results section (page 6) and the discussion (page 11) to clarify these points.

      The case for the utility of a stimulus-computable model is convincing (as I mentioned above), but its framing as mission-critical for understanding multisensory perception is overstated, I think. The line for what is "stimulus computable" is arbitrary and doesn't seem to be followed in the paper. A strict definition might realistically require inputs to be, e.g., the patterns of light and sound waves available to our eyes and ears, while an even more strict definition might (unrealistically) require those stimuli to be physically present and transduced by the model. A reasonable looser definition might allow an "abstract and low-dimensional representation of the stimulus, such as the stimulus envelope (which was used in the paper), to be an input. Ultimately, some preprocessing of a stimulus does not necessarily confound interpretations about (multi)sensory perception. And on the flip side, the stimulus-computable aspect doesn't necessarily give the model supreme insight into perception. For example, the MCD model was "confused" by the stimuli used in our 2018 paper (Nidiffer et al., 2018; Parise & Ernst, 2025). In each of our stimuli (including catch trials), the onset and offset drove strong AV temporal correlations across all stimulus conditions (including catch trials), but were irrelevant to participants performing an amplitude modulation detection task. The to-be-detected amplitude modulations, set at individual thresholds, were not a salient aspect of the physical stimulus, and thus only marginally affected stimulus correlations. The model was of course, able to fit our data by "ignoring" the on/offsets (i.e., requiring human intervention), again highlighting that the model is tapping into a very basic and ubiquitous computational principle of (multi)sensory perception. But it does reveal a limitation of such a stimulus-computable model: that it is (so far) strictly bottom-up.

      We appreciate the reviewerโ€™s thoughtful engagement with the concept of stimulus computability. We agree that the term requires careful definition and should not be taken as a guarantee of perceptual insight or neural plausibility. In our work, we define a model as โ€œstimulus-computableโ€ if all its inputs are derived directly from the stimulus, rather than from experimenter-defined summary descriptors such as temporal lag, spatial disparity, or cue reliability. In the context of multisensory integration, this implies that a model must account not only for how cues are combined, but also for how those cues are extracted from raw inputsโ€”such as audio waveforms and visual contrast sequences.

      This distinction is central to our modelling philosophy. While ideal observer models often specify how information should be combined once identified, they typically do not address the upstream question of how this information is extracted from sensory input. In that sense, models that are not stimulus-computable leave out a key part of the perceptual pipeline. We do not present stimulus computability as a marker of theoretical superiority, but rather as a modelling constraint that is necessary if oneโ€™s aim is to explain how structured sensory input gives rise to perception. This is a view that is also explicitly acknowledged and supported by Reviewer 2.

      Framed in Marrโ€™s (1982) terms, nonโ€“stimulus-computable models tend to operate at the computational level, defining what the system is doing (e.g., computing a maximum likelihood estimate), whereas stimulus-computable models aim to function at the algorithmic level, specifying how the relevant representations and operations might be implemented. When appropriately constrained by biological plausibility, such models may also inform hypotheses at the implementational level, pointing to potential neural substrates that could instantiate the computation.

      Regarding the reviewerโ€™s example illustrating a limitation of the MCD model, we respectfully note that the account appears to be based on a misreading of our prior work. In Parise & Ernst (2025), where we simulated the stimuli from Nidiffer et al. (2018), the MCD model reproduced participantsโ€™ behavioural data without any human intervention or adjustment. The model was applied in a fully bottom-up, stimulus-driven manner, and its output aligned with observer responses as-is. We suspect the confusion may stem from analyses shown in Figure 6 - Supplement Figure 5 of Parise & Ernst (2025), where we investigated the lack of a frequency-doubling effect in the Nidiffer et al. data. However, those analyses were based solely on the Pearson correlation between auditory and visual stimulus envelopes and did not involve the MCD model. No manual exclusion of onset/offset events was applied, nor was the MCD used in those particular figures. We also note that Parise & Ernst (2025) is a separate, already published study and is not the manuscript currently under review.ย 

      In summary, while we fully agree that stimulus computability does not resolve all the complexities of multisensory perception (see comments below about speech), we maintain that it provides a valuable modelling constraintโ€”one that enables robust, generalisable predictions when appropriately scoped.ย 

      The manuscript rightly chooses to focus a lot of the work on speech, fitting the MCD model to predict behavioral responses to speech. The range of findings from AV speech experiments that the MCD can account for is very convincing. Given the provided context that speech is "often claimed to be processed via dedicated mechanisms in the brain," a statement claiming a "first end-to-end account of multisensory perception," and findings that the MCD model can account for speech behaviors, it seems the reader is meant to infer that energetic correlation detection is a complete account of speech perception. I think this conclusion misses some facets of AV speech perception, such as integration of higher-order, non-redundant/correlated speech features (Campbell, 2008) and also the existence of top-down and predictive processing that aren't (yet!) explained by MCD. For example, one important benefit of AV speech is interactions on linguistic processes - how complementary sensitivity to articulatory features in the auditory and visual systems (Summerfield, 1987) allow constraint of linguistic processes (Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      We thank the reviewer for their thoughtful comments, and especially for the kind words describing the range of findings from our AV speech simulations as โ€œvery convincing.โ€

      We would like to clarify that it is not our view that speech perception can be reduced to energetic correlation detection. While the MCD model captures low- to mid-level temporal dependencies between auditory and visual signals, we fully agree that a complete account of audiovisual speech perception must also include higher-order processesโ€”including linguistic mechanisms and top-down predictions. These are critical components of AV speech comprehension, and lie beyond the scope of the current model.

      Our use of the term โ€œend-to-endโ€ is intended in a narrow operational sense: the model transforms raw audiovisual input (i.e., audio waveforms and video frames) directly into behavioural output (i.e., button press responses), without reliance on abstracted stimulus parameters such as lag, disparity or reliability. It is in this specific technical sense that the MCD offers an end-to-end model. We have revised the manuscript to clarify this usage to avoid any misunderstanding.

      In light of the reviewerโ€™s valuable point, we have now edited the Discussion to acknowledge the importance of linguistic processes (page 13) and to clarify what we mean by end-to-end account (page 11). We agree that future work will need to explore how stimulus-computable models such as the MCD can be integrated with broader frameworks of linguistic and predictive processing (e.g., Summerfield, 1987; Campbell, 2008; Peelle & Sommers, 2015; Tye-Murray et al., 2007).

      References

      Campbell, R. (2008). The processing of audio-visual speech: empirical and neural bases. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1493), 1001-1010. https://doi.org/10.1098/rstb.2007.2155

      Nidiffer, A. R., Diederich, A., Ramachandran, R., & Wallace, M. T. (2018). Multisensory perception reflects individual differences in processing temporal correlations. Scientific Reports 2018 8:1, 8(1), 1-15. https://doi.org/10.1038/s41598-018-32673-y

      Parise, C. V, & Ernst, M. O. (2025). Multisensory integration operates on correlated input from unimodal transient channels. ELife, 12. https://doi.org/10.7554/ELIFE.90841

      Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169-181. https://doi.org/10.1016/j.cortex.2015.03.006

      Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by Eye: The Psychology of Lip-Reading (pp. 3-51). Lawrence Erlbaum Associates.

      Tye-Murray, N., Sommers, M., & Spehar, B. (2007). Auditory and Visual Lexical Neighborhoods in Audiovisual Speech Perception: Trends in Amplification, 11(4), 233-241. https://doi.org/10.1177/1084713807307409

      Reviewer #2 (Public review):

      Summary:

      Building on previous models of multisensory integration (including their earlier correlation-detection framework used for non-spatial signals), the author introduces a population-level Multisensory Correlation Detector (MCD) that processes raw auditory and visual data. Crucially, it does not rely on abstracted parameters, as is common in normative Bayesian models," but rather works directly on the stimulus itself (i.e., individual pixels and audio samples). By systematically testing the model against a range of experiments spanning human, monkey, and rat data, the authors show that their MCD population approach robustly predicts perception and behavior across species with a relatively small (0-4) number of free parameters.

      Strengths:

      (1) Unlike prior Bayesian models that used simplified or parameterized inputs, the model here is explicitly computable from full natural stimuli. This resolves a key gap in understanding how the brain might extract "time offsets" or "disparities" from continuously changing audio-visual streams.

      (2) The same population MCD architecture captures a remarkable range of multisensory phenomena, from classical illusions (McGurk, ventriloquism) and synchrony judgments, to attentional/gaze behavior driven by audio-visual salience. This generality strongly supports the idea that a single low-level computation (correlation detection) can underlie many distinct multisensory effects.

      (3) By tuning model parameters to different temporal rhythms (e.g., faster in rodents, slower in humans), the MCD explains cross-species perceptual data without reconfiguring the underlying architecture.

      We thank the reviewer for their positive evaluation of the manuscript, and particularly for highlighting the significance of the model's stimulus-computable architecture and its broad applicability across species and paradigms. Please find our responses to the individual points below.

      Weaknesses:

      (1) The authors show how a correlation-based model can account for the various multisensory integration effects observed in previous studies. However, a comparison of how the two accounts differ would shed light on the correlation model being an implementation of the Bayesian computations (different levels in Marr's hierarchy) or making testable predictions that can distinguish between the two frameworks. For example, how uncertainty in the cue combined estimate is also the harmonic mean of the unimodal uncertainties is a prediction from the Bayesian model. So, how the MCD framework predicts this reduced uncertainty could be one potential difference (or similarity) to the Bayesian model.

      We fully agree with the reviewer that a comparison between the correlation-based MCD model and Bayesian accounts is valuableโ€”particularly for clarifying how the two frameworks differ conceptually and where they may converge.

      As noted in the revised manuscript, the key distinction lies in the level of analysis described by Marr (1982). Bayesian models operate at the computational level, describing what the system is aiming to compute (e.g., optimal cue integration). In contrast, the MCD functions at the algorithmic level, offering a biologically plausible mechanism for how such integration might emerge from stimulus-driven representations.

      In this context, the MCD provides a concrete, stimulus-grounded account of how perceptual estimates might be constructedโ€”potentially implementing computations with Bayesian-like characteristics (e.g., reduced uncertainty, cue weighting). Thus, the two models are not mutually exclusive but can be seen as complementary: the MCD may offer an algorithmic instantiation of computations that, at the abstract level, resemble Bayesian inference.

      We have now updated the manuscript to explicitly highlight this relationship (pages 2 and 11). In the revised manuscript, we also included a new figure (Figure 5) and movie (Supplementary Movie 3), to show how the present approach extends previous Bayesian models for the case of cue integration (i.e., the ventriloquist effect).

      (2) The authors show a good match for cue combination involving 2 cues. While Bayesian accounts provide a direction for extension to more cues (also seen empirically, for eg, in Hecht et al. 2008), discussion on how the MCD model extends to more cues would benefit the readers.

      We thank the reviewer for this insightful comment: extending the MCD model to include more than two sensory modalities is a natural and valuable next step. Indeed, one of the strengths of the MCD framework lies in its modularity. Let us consider the MCDcorrโ€‹ output (Equation 6), which is computed as the pointwise product of transient inputs across modalities. Extending this to include a third modality, such as touch, is straightforward: MCD units would simply multiply the transient channels from all three modalities, effectively acting as trimodal coincidence detectors that respond when all inputs are aligned in time and space.

      By contrast, extending MCDlag is less intuitive, due to its reliance on opponency between two subunits (via subtraction). A plausible solution is to compute MCDlag in a pairwise fashion (e.g., AV, VT, AT), capturing relative timing across modality pairs.

      Importantly, the bulk of the spatial integration in our framework is carried by MCDcorr, which generalises naturally to more than two modalities. We have now formalised this extension and included a graphical representation in a supplementary section of the revised manuscript.

      Likely Impact and Usefulness:

      The work offers a compelling unification of multiple multisensory tasks- temporal order judgments, illusions, Bayesian causal inference, and overt visual attention - under a single, fully stimulus-driven framework. Its success with natural stimuli should interest computational neuroscientists, systems neuroscientists, and machine learning scientists. This paper thus makes an important contribution to the field by moving beyond minimalistic lab stimuli, illustrating how raw audio and video can be integrated using elementary correlation analyses.

      Reviewer #1 (Recommendations for the authors):

      Recommendations:

      My biggest concern is a lack of specificity about model fitting, which is assuaged by the inclusion of sufficient detail to replicate the analysis completely or the inclusion of the analysis code. The code availability indicates a script for the population model will be included, but it is unclear if this code will provide the fitting details for the whole of the analysis.

      We thank the reviewer for raising this important point. A new methodological section has been added to the manuscript, detailing the model fitting procedures used throughout the study. In addition, the accompanying code repository now includes MATLAB scripts that allow full replication of the spatiotemporal MCD simulations.

      Perhaps it could be enlightening to re-evaluate the model with a measure of error rather than correlation? And I think many researchers would be interested in the model's performance on unseen data.

      The model has now been re-evaluated using mean squared error (MSE), and the results remain consistent with those obtained using Pearson correlation. Additionally, we have clarified which parts of the study involve testing the model on unseen data (i.e., data not used to fit the temporal constants of the units). These analyses are now included and discussed in the revised fitting section of the manuscript (pages 23-24).

      Otherwise, my concerns involve the interpretation of findings, and thus could be satisfied with minor rewording or tempering conclusions.

      The manuscript has been revised to address these interpretative concerns, with several conclusions reworded or tempered accordingly. All changes are marked in blue in the revised version.

      Miscellanea:

      Should b0 in equation 10 be bcrit to match the below text?

      Thank you for catching this inconsistency. We have corrected Equation 10 (and also Equation 21) to use the more transparent notation bcrit instead of b0, in line with the accompanying text.

      Equation 23, should time be averaged separately? For example, if multiple people are speaking, the average correlation for those frames will be higher than the average correlation across all times.

      We thank the reviewer for raising this thoughtful and important point. In response, we have clarified the notation of Equation 23 in the revised manuscript (page 20). Specifically, we now denote the averaging operations explicitly as spatial means and standard deviations across all pixel locations within each frame.

      This equation computes the z-score of the MCD correlation value at the current gaze location, normalized relative to the spatial distribution of correlation values in the same frame. That is, all operations are performed at the frame level, not across time. This ensures that temporally distinct events are treated independently and that the final measure reflects relative salience within each moment, not a global average over the stimulus. In other words, the spatial distribution of MCD activity is re-centered and rescaled at each frame, exactly to avoid the type of inflation or confounding the reviewer rightly cautioned against.

      Reviewer #2 (Recommendations for the authors):

      The authors have done a great job of providing a stimulus computable model of cue combination. I had just a few suggestions to strengthen the theoretical part of the paper:

      (1) While the authors have shown a good match between MCD and cue combination, some theoretical justification or equivalence analysis would benefit readers on how the two relate to each other. Something like Zhang et al. 2019 (which is for motion cue combination) would add to the paper.

      We agree that it is important to clarify the theoretical relationship between the Multisensory Correlation Detector (MCD) and normative models of cue integration, such as Bayesian combination. In the revised manuscript, we have now modified the introduction and added a paragraph in the Discussion addressing this link more explicitly. In brief, we see the MCD as an algorithmic-level implementation (in Marrโ€™s terms) that may approximate or instantiate aspects of Bayesian inference.

      (2) Simulating cue combination for tasks that require integration of more than two cues (visual, auditory, haptic cues) would more strongly relate the correlation model to Bayesian cue combination. If that is a lot of work, at least discussing this would benefit the paper

      This point has now been addressed, and a new paragraph discussing the extension of the MCD model to tasks involving more than two sensory modalities has been added to the Discussion section.

    1. For example,Britzman's discussion of the ego, desire, and uncertaintyremindsme of certainaspects of Buddhism, and makes me wonder whether there are insights from,say, Asian philosophies (and African philosophies, indigenous philosophies)that might help us think differently about what it means to teach, to learn, andto engage in anti-oppressiveeducation

      I think this is a helpful way of reaching an important theoretical point here. It seems like Kumashiro is asking the question: How do teachers get students to be different? And the answer is like, "It's hard, because even when you teach them new stuff, set up their environment perfectly for change to happen, make them feel safe, and expose them to a nice new anti-racist identity to switch into, the transformation doesn't always happen." I worry that we've sort of lost the thread of anti-racism though, and we're just hanging out in the deep end. Also, here, maybe, is another pointer towards Kumashiro's comfort with themes from religiosity waiting in the wings, and raises the question whether this anti-rationalist conclusion is itself a symptom of Kumashiro's own academic conceptual framework.

    1. โ€œStorytelling involves a particular language and set of relationships; it is a body of knowledge and abilities that are activities only within its happeningโ€

      This shows storytelling as an active, living process. It is not just words, but it's a connection between the teller and the listener. This makes storytelling unique compared to reading and memorizing facts.

    2. As was stated, storytelling is changing. This includes the way it is shaped. More and more educators, in addition to telling stories in the classroom or having students share stories in the classroom, are employing digital storytelling in the classroom. Although there are many digital applications being used in the classroom, in order to be considered storytelling, story must be first, as Jason Ohler, author of Digital Storytelling in the Classroom (2006), states: โ€œThe problem for many students is their focus on the power of technology rather than the power of stories. Some students are engaging the medium at the expense of the message, producing a technical event rather than a storyโ€ (p. 46). Digital uses of narrative are becoming more accessible and expanding as we continue to employ digital learning in our classrooms. We advocate that with every new turn, story stays at the center of the change.

      Storytelling is interactive and alive, not just reading words or acting, it's about co-creating meaning with an audience.

    1. without perceiving what four times four really means

      Especially in math, it's important that the students not just understand what the answer is but HOW to get the answer. Related to the whole "give a man a fish, he'll eat for a day, but teach a man to fish, he'll always eat" saying, active learning helps students continually understand (when it's put in place).

    1. if I can really let go of any theory of who I am, then I'll let go of any fear.

      for - adjacency - letting go - of knowledge - of theories - Donald Hoffman - I've often felt as he does - it's a conundrum of letting go of that (knowledge) we've invested so heavily into - quote / key insight - letting go of theories of science and self - Donald Hoffman - Science is great, but don't believe any theory. <br /> - Theories are just tools. They're not the truth. - No scientific theory, my theories included, are the truth. - And so also is my theory about who I am not the truth. - So to really let go of any theory, if I can really let go of any theory of who I am, then I'll let go of any fear

    2. what the Bible is basically saying, love God with all your heart. That it's loving yourself. You are God. And loving your neighbor as yourself is just recognizing that your neighbor is yourself under a different avatar.

      for - adjacency - Christian teaching - infinite intelligence - loving God - loving your neighbor - loving yourself - all the same - Donald Hoffman

    3. Almost all of us think of ourselves as an object in spaceime only here for a short amount of time and will soon die

      for - quote - Almost all of us think of ourselves as an object in spacetime only here for a short amount of time and will soon die - Donald Hoffman When I say you transcend any scientific

      • Almost all of us think of ourselves as
        • an object in spacetime only here for a short amount of time and will soon die.
      • When I say you transcend any scientific theory,
        • that means the theory that I am just a 160lb object in spacetime is just a theory and it's not the truth.
      • That's not the truth about who I am.
      • That's just a theory that I have because spacetime itself is just a theory.
      • Nothing inside spacetime is anything but my headset interpretation of a reality that infinitely transcends anything I can experience.
    4. if a bat is sat there thinking that they understand the nature of reality when it's actually just a map

      for - comparison - bat umwelt vs human umwelt - good comparison - all sensory signals of living beings only ever generated major of reality, - never 'reality' itself, whatever that may be - We humans can study other species and observe how their senses create their respective maps of reality - but our senses fall on the same continuum

    1. Author response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      We thank the reviewer for highlighting the strengths of this work.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example), but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      We thank the reviewer for raising this crucial and insightful point regarding the framing of our results and the definitions of 'normative' and 'Bayesian' learning. Our primary aim in this work was to characterize specific behavioral signatures that demonstrate deviations from predictions generated by a strict, idealized Bayesian framework when learning from disinformation (which we term โ€œbiasesโ€). We deliberately employed relatively simple Bayesian models as benchmarks to highlight these specific biases. We fully agree that more sophisticated Bayes-based models (as mentioned by the reviewer, or others) could potentially offer alternative mechanistic explanations for participant behavior. However, we currently do not have a strong notion about which Bayesian models can encompass our findings, and hence, we leave this important question for future work.

      To enhance clarity within the current manuscript we now avoided the use of the term โ€œnormativeโ€ to refer to our Bayesian models, using the term โ€œidealโ€ instead. We also define more clearly what exactly we mean by that notion when the idea model is described:

      โ€œThis model is based on an idealized assumptions that during the feedback stage of each trial, the value of the chosen bandit is updated (based on feedback valence and credibility) according to Bayes rule reflecting perfect adherence to the instructed task structure (i.e., how true outcomes and feedback are generated).โ€

      Moreover, we have added a few sentences in the discussion commenting on how more complex Bayesian models might account for our empirical findings:

      โ€œHowever, as hypothesized, when facing potential disinformation, we also find that individuals exhibit several important biases i.e., deviations from strictly idealized Bayesian strategies. Future studies should explore if and under what assumptions, about the taskโ€™s generative structure and/or learnerโ€™s priors and objectives, more complex Bayesian models (e.g., active inference (58)) might account for our empirical findings.โ€

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.

      We thank the reviewer for their valuable suggestion. We have now included more details about the experiment in the abstract:

      โ€œIn two experiments, participants completed a two-armed bandit task, where they repeatedly chose between two lotteries and received outcome-feedback from sources of varying credibility, who occasionally disseminated disinformation by lying about true choice outcome (e.g., reporting non reward when a reward was truly earned or vice versa).โ€

      One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      We appreciate the reviewer's thoughtful comment regarding the conceptualization of "normative" and "Bayesian" learning. We fully agree that the definition of "normative" is nuanced and can indeed depend on whether one considers reward-maximization or the underlying principles of belief updating. As explained above we now restrict our presentation to deviations from โ€œideal Bayesโ€ learning patterns and we acknowledge the reviewerโ€™s concern in a caveat in our discussion.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      We thank the reviewer for raising this interesting point regarding the presentation of the agent before the choice. Our decision to present the agent at this stage was intentional, as our original experimental design aimed to explore the possible effects of "expected source credibility" on participants' choices (e.g., whether knowledge of feedback credibility will affect choice speed and accuracy). However, we found nothing that would be interesting to report.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      Thanks for these additional important ideas which speak more to the notion that more complex Bayesian frameworks may account for biases we report.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      We included these questionnaires for exploratory purposes, with the aim of generating informed hypotheses for future research into individual differences in learning. Given the preliminary nature of these analyses, we believe further research is required about this important topic.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      We thank the reviewer for this important recommendation. We agree and this point is included in our caveat (cited above) that future research should address what assumptions about the generative task structure can allow Bayesian models to account for our empirical patterns.

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.<br /> Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      We thank the reviewer for these insightful suggestions speaking further to the point about more complex Bayesian models.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

      We agree with the reviewer that a task involving performance-based bonuses, and particularly one where participants are explicitly told they are being lied to, might elicit weak emotional response. However, our primary point is that the degree of these responses is expected to be substantially weaker than those typically observed in the broader disinformation literature, which frequently deals with highly salient political, social, or identity-related topics that inherently carry strong emotional and personal ties for participants, leading to much more pronounced affective engagement and potential biases. Our task deliberately avoids such issues thus minimizing the potential for significant emotion-driven biases. We have toned down the discussion accordingly:

      โ€œThis occurs even when the decision at hand entails minimal emotional engagement or pertinence to deep, identity-related, issues.โ€

      Reviewer #2 (Public review):

      This valuable paper studies the problem of learning from feedback given by sources of varying credibility. The solid combination of experiment and computational modeling helps to pin down properties of learning, although some ambiguity remains in the interpretation of results.

      Summary:

      This paper studies the problem of learning from feedback given by sources of varying credibility. Two banditstyle experiments are conducted in which feedback is provided with uncertainty, but from known sources. Bayesian benchmarks are provided to assess normative facets of learning, and alternative credit assignment models are fit for comparison. Some aspects of normativity appear, in addition to deviations such as asymmetric updating from positive and negative outcomes.

      Strengths:

      The paper tackles an important topic, with a relatively clean cognitive perspective. The construction of the experiment enables the use of computational modeling. This helps to pinpoint quantitatively the properties of learning and formally evaluate their impact and importance. The analyses are generally sensible, and parameter recovery analyses help to provide some confidence in the model estimation and comparison.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      (1) The approach in the paper overlaps somewhat with various papers, such as Diaconescu et al. (2014) and Schulz et al. (forthcoming), which also consider the Bayesian problem of learning and applying source credibility, in terms of theory and experiment. The authors should discuss how these papers are complementary, to better provide an integrative picture for readers.

      Diaconescu, A. O., Mathys, C., Weber, L. A., Daunizeau, J., Kasper, L., Lomakina, E. I., ... & Stephan, K. E. (2014). Inferring the intentions of others by hierarchical Bayesian learning. PLoS computational biology, 10(9), e1003810.

      Schulz, L., Schulz, E., Bhui, R., & Dayan, P. Mechanisms of Mistrust: A Bayesian Account of Misinformation Learning. https://doi.org/10.31234/osf.io/8egxh

      We thank the reviewers for pointing us to this relevant work. We have updated the introduction, mentioning these precedents in the literature and highlighting our specific contributions:

      โ€œTo address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37โ€“41), our approach is novel in using one of its most popular tasks: the โ€œbandit taskโ€.โ€

      We also explain in the discussion how these papers relate to the current study:

      โ€œUnlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72โ€“75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on othersโ€™ actions or adviceโ€”whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72).โ€

      (2) It isn't completely clear what the "cross-fitting" procedure accomplishes. Can this be discussed further?

      We thank the reviewer for requesting further clarification on the cross-fitting procedure. Our study utilizes two distinct model families: Bayesian models and CA models. The credit assignment parameters from the CA models can be treated as โ€œdata/behavioural featuresโ€ corresponding to how choice feedback affects choice-propensities. The cross fitting-approach allows us in effect to examine whether these propensity features are predicted from our Bayesian models. To the extent they are not, we can conclude empirical behavior is โ€œbiasedโ€.

      Thus, in our cross-fitting procedure we compare the CA model parameters extracted from participant data (empirical features) with those that would be expected if our Bayesian agents performed the task. Specifically, we first fit participant behavior with our Bayesian models, then simulate this model using the best-fitted parameters and fit those simulations with our CA models. This generates a set of CA parameters that would be predicted if participants behavior is reduced to a Bayesian account. By comparing these predicted Bayesian CA parameters with the actual CA parameters obtained from human participants, the cross-fitting procedure allows us to quantitatively demonstrate that the observed participant parameters are indeed statistically significant deviations from normative Bayesian processing. This provides a robust validation that the biases we identify are not artifacts of the CA model's structure but true departures from normative learning.

      We also note that Reviewer 3 suggested an intuitive way to think about the CA parametersโ€”as analogous to logistic regression coefficients in a โ€œsophisticated regressionโ€ of choice on (recencyweighted) choice-feedback. We find this suggestion potentially helpful for readers. Under this interpretation, the purpose of the cross-fitting method can be seen simply as estimating the regression coefficients that would be predicted by our Bayesian agents, and comparing those to the empirical coefficients.

      In our manuscript we now explain this issues more clearly by explaining how our model is analogous to a logistic regression:

      โ€œThe probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a โ€œsophisticatedโ€ logistic regression, where the CA parameters take the role of โ€œregression coefficientsโ€ corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.โ€

      We also explain our cross-fitting procedure in more detail:

      โ€œTo further characterise deviations between behaviour and our Bayesian learning models, we used a โ€œcrossfittingโ€ method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participantsโ€™ best fitting parameters), but fitted these data using the CA-models, obtaining what we term โ€œBayesian-CA parametersโ€ (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters).โ€

      (3) The Credibility-CA model seems to fit the same as the free-credibility Bayesian model in the first experiment and barely better in the second experiment. Why not use a more standard model comparison metric like the Bayesian Information Criterion (BIC)? Even if there are advantages to the bootstrap method (which should be described if so), the BIC would help for comparability between papers.

      We thank the reviewer for this important comment regarding our model comparison approach. We acknowledge that classical information criteria like AIC and BIC are widely used in RL studies. However, we argue our method for model-comparison is superior.

      We conducted a model recovery analysis demonstrating a significant limitation of using AIC or BIC for model-comparison in our data. Both these methods are strongly biased in favor of the Bayesian models. Our PBCM method, on the other hand, is both unbiased and more accurate. We believe this is because โ€œoff the shelfโ€ methods like AIC and BIC rely on strong assumptions (such as asymptotic sample size and trial-independence) that are not necessarily met in our tasks (Data is finite; Trials in RL tasks depend on previous trials). PBCM avoids such assumptions to obtain comparison criteria specifically tailored to the structure and size of our empirical data. We have now mentioned this fact in the results section of the main text:

      โ€œWe considered using AIC and BIC, which apply โ€œoff-the shelfโ€ penalties for model-complexity. However, these methods do not adapt to features like finite sample size (relying instead on asymptotic assumption) or temporal dependence (as is common in reinforcement learning experiments). In contrast, the parametric bootstrap cross-fitting method replaces these fixed penalties with empirical, data-driven criteria for modelselection. Indeed, model-recovery simulations confirmed that whereas AIC and BIC were heavily biased in favour of the Bayesian models, the bootstrap method provided excellent model-recovery (See Fig. S20).โ€

      We have also included such model recovery in the SI document:

      (4) As suggested in the discussion, the updating based on random feedback could be due to the interleaving of trials. If one is used to learning from the source on most trials, the occasional random trial may be hard to resist updating from. The exact interleaving structure should also be clarified (I assume different sources were shown for each bandit pair). This would also relate to work on RL and working memory: Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 10241035.

      We thank the reviewer for this point. The specific interleaved structure of the agents is described in the main text:

      โ€œEach agent provided feedback for 5 trials for each bandit pair (with the agent order interleaved within the bandit pair).โ€

      As well as in the methods section:

      โ€œFeedback agents were randomly interleaved across trials subject to the constraint that each agent appeared on 5-trials for each bandit pair.โ€

      We also thank the reviewer for mentioning the relevant work on working memory. We have now added it to our discussion point:

      โ€œIn our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the โ€œcontinued-influence effectโ€ whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60). One possible explanation is that some participants failed to infer that feedback from the 1-star agent was statistically void of information content, essentially random (e.g., the group-level credibility of this agent was estimated by our free-credibility Bayesian model as higher than 50%). Participants were instructed that this feedback would be โ€œa lieโ€ 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded. Notably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individualโ€™s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).โ€

      (5) Why does the choice-repetition regression include "only trials for which the last same-pair trial featured the 3-star agent and in which the context trial featured a different bandit pair"? This could be stated more plainly.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a different context would constitute a striking demonstration of our โ€œcontrast effectโ€. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the samecontext condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      โ€œA comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all pโ€™s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a โ€œlow credibility contextโ€ (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participantsโ€™ learning when it follows non-credible feedback, in the same learning context.โ€

      We have modified the discussion accordingly as well:

      โ€œA striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a โ€œcontrast effectโ€, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine is produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of โ€˜fake newsโ€™. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts , which may boost their impact.โ€

      And we have included some additional analyses in the SI document:

      โ€œ3.5 Contrast effects for contexts featuring a different bandit

      Given that we observed a contrast effect when both the learning and the immediately preceding "context trialโ€ involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair โ€“ a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      ๐‘…๐ธ๐‘ƒ๐ธ๐ด๐‘‡ ~ ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡_๐‘‡๐‘Œ๐‘ƒ๐ธ โˆ— ๐น๐ธ๐ธ๐ท๐ต๐ด๐ถ๐พ โˆ— (๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>2-star</sub> + ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>3-star</sub>) + ๐ต๐ธ๐‘‡๐‘‡๐ธ๐‘… + (1|๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘–๐‘๐‘–๐‘๐‘Ž๐‘›๐‘ก)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.โ€

      (6) Why apply the "Truth-CA" model and not the Bayesian variant that it was motivated by?

      Thanks for this very useful suggestion. We are unsure if we fully understand the question. The Truth-CA model was not motivated by a new Bayesian model. Our Bayesian models were simply used to make the point that participants may partially discriminate between truthful and untruthful feedback (for a given source). This led to the idea that perhaps more credit is assigned for truth (than lie) trials, which is what we found using our Truth-CA model. Note we show that our Bayesian models cannot account for this modulation.

      We have now improved our "Truth-CA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our previous model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome).

      We have described this new model in the methods section:

      โ€œAdditionally, we formulated a โ€œTruth-CAโ€ model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      ๐‘„ โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„ + [๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก) + ๐‘‡๐ต โˆ— (๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) โˆ’ 0.5)] โˆ— ๐น

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see โ€œMethods: Bayesian estimation of posterior belief that feedback is trueโ€).โ€

      All relevant results have been updated accordingly in the main text:

      โ€œTo formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the โ€œTruth-CAโ€ model) to the data. This variant works as our Credibility-CA model but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule: ๐‘„ โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„ + [๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก) + ๐‘‡๐ต โˆ— (๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) โˆ’ 0.5)] โˆ— ๐น where ๐‘‡๐ต is the free parameter representing the truth bonus, and ๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) is the probability the received feedback being true (from the experimenterโ€™s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenterโ€™s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this โ€˜oracle modelโ€™ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood. Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.โ€œ

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      โ€œWe next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the โ€œTruth-CAโ€ model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).โ€

      (7) "Overall, the results from this study support the exact same conclusions (See SI section 1.2) but with one difference. In the discovery study, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3)" - this seems like a very salient difference, when the paper reports the feedback effect as a primary finding of interest, though I understand there remains a valence-based difference.

      We agree with the reviewer and thank them for this suggestion. We now state explicitly throughout the manuscript that this finding was obtained only in one of our two studies. In the section โ€œDiscovery studyโ€ of the results we state explicitly this finding was not found in the discovery study:

      โ€œHowever, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).โ€

      We also note that related to another concern from R3 (that perseveration may masquerade as positivity bias) we conducted additional analyses (detailed in SI 3.6.2). These analyses revealed that the observed positivity bias for the 1-star agent in the discovery study falls within the range predicted by simple choice-perseveration. Consequently, we have removed the suggestion that participants still learn from the random agent in the discovery study. Furthermore, we have modified the discussion section to include a possible explanation for this discrepancy between the two studies:

      โ€œNotably, however, there was no corresponding evidence random feedback affected behaviour in our discovery study. It is possible that an individualโ€™s ability to filter out random information might have been limited due to a high cognitive load induced by our main study task, which required participants to track the values of three bandit pairs and juggle between three interleaved feedback agents (whereas in our discovery study each experimental block featured a single bandit pair). Future studies should explore more systematically how the ability to filter random feedback depends on cognitive load (61).โ€

      (8) "Participants were instructed that this feedback would be "a lie 50% of the time but were not explicitly told that this meant it was random and should therefore be disregarded." - I agree that this is a possible explanation for updating from the random source. It is a meaningful caveat.

      Thank you for this thought. While this can be seen as a caveatโ€”since we donโ€™t know what would have happened with explicit instructionsโ€”we also believe it is interesting from another perspective. In many real-life situations, individuals may have all the necessary information to infer that the feedback they receive is uninformative, yet still fail to do so, especially when they are not explicitly told to ignore it.

      In future work, we plan to examine how behaviour changes when participants are given more explicit instructionsโ€”for example, that the 50%-credibility agent provides purely random feedback.

      (9) "Future studies should investigate conditions that enhance an ability to discard disinformation, such as providing explicit instructions to ignore misleading feedback, manipulations that increase the time available for evaluating information, or interventions that strengthen source memory." - there is work on some of this in the misinformation literature that should be cited, such as the "continued influence effect". For example: Johnson, H. M., & Seifert, C. M. (1994). Sources of the continued influence effect: When misinformation in memory affects later inferences. Journal of experimental psychology: Learning, memory, and cognition, 20(6), 1420.

      We thank the reviewer for pointing us towards the relevant literature. We have now included citations about the โ€œcontinued influence effectโ€ of misinformation in the discussion:

      โ€œIn our main study, we show that participants revised their beliefs based on entirely non-credible feedback, whereas an ideal Bayesian strategy dictates such feedback should be ignored. This finding resonates with the โ€œcontinued-influence effectโ€ whereby misleading information continues to influence an individual's beliefs even after it has been retracted (59,60).โ€

      (10) Are the authors arguing that choice-confirmation bias may be at play? Work on choice-confirmation bias generally includes counterfactual feedback, which is not present here.

      We agree with the reviewer that a definitive test for choice-confirmation bias typically requires counterfactual feedback, which is not present in our current task. In our discussion, we indeed suggest that the positivity bias we observe may stem from a form of choice-confirmation, drawing on the extensive literature on this bias in reinforcement learning (Lefebvre et al., 2017; Palminteri et al., 2017; Palminteri & Lebreton, 2022). However, we fully acknowledge that this link is a hypothesis and that explicitly testing for choice-confirmation bias would necessitate a future study specifically incorporating counterfactual feedback. We have included a clarification of this point in the discussion:

      โ€œPrevious reinforcement learning studies, report greater credit-assignment based on positive compared to negative feedback, albeit only in the context of veridical feedback (43,44,62). Here, supporting our a-priori hypothesis we show that this positivity bias is amplified for information of low and intermediate credibility (in absolute terms in the discovery study, and relative to the overall extent of CA in both studies) . Of note, previous literature has interpreted enhanced learning for positive outcomes in reinforcement learning as indicative of a confirmation bias (42,44). For example, positive feedback may confirm, to a greater extent than negative feedback oneโ€™s choice as superior (e.g., โ€œI chose the better of the two optionsโ€). Leveraging the framework of motivated cognition (35), we posited that feedback of uncertain veracity (e.g., low credibility) amplifies this bias by incentivising individuals to self-servingly accept positive feedback as true (because it confers positive, desirable outcomes), and explain away undesirable, choice-disconfirming, negative feedback as false. This could imply an amplified confirmation bias on social media, where content from sources of uncertain credibility, such as unknown or unverified users, is more easily interpreted in a self-serving manner, disproportionately reinforcing existing beliefs (63). In turn, this could contribute to an exacerbation of the negative social outcomes previously linked to confirmation bias such as polarization (64,65), the formation of โ€˜echo chambersโ€™ (19), and the persistence of misbelief regarding contemporary issues of importance such as vaccination (66,67) and climate change (68โ€“71). We note however, that further studies are required to determine whether positivity bias in our task is indeed a form of confirmation bias.โ€

      Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      We deeply thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the right way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but not the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the nonnormalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      We thank the reviewer for raising this important point about the definition of positivity bias, and for their thoughtful discussion on the absolute versus relative measures. We believe that the relative valence bias offers a distinct and valuable perspective on positivity bias. Conceptually, this measure describes positivity bias in a manner akin to a โ€œpercentage differenceโ€ relative to the overall level of learning which allows us to control for the overall decreases in the overall amount of credit assignment as feedback becomes less credible. We are unsure if one measure is better or more correct than the other and we believe that reporting both measures enriches the understanding of positivity bias and allows for a more comprehensive characterization of this phenomenon (as long as these measures are interpreted carefully). We have stated the significance of the relative measure in the results section:

      โ€œFollowing previous research, we quantified positivity bias in 2 ways: 1) as the absolute difference between credit-assignment based on positive or negative feedback, and 2) as the same difference but relative to the overall extent of learning. We note that the second, relative, definition, is more akin to โ€œpercentage changeโ€ measurements providing a control for the overall lower levels of credit-assignment for less credible agent.โ€

      We also wish to point out that in our discovery study we had some evidence for amplification of positivity bias in absolute sense.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      We thank the reviewer for this very insightful and crucial point regarding the potential confound between positivity bias and perseveration. We entirely agree that distinguishing these effects can be challenging. To rigorously address this concern and ascertain that our observed positivity bias, particularly its inflation for low-credibility feedback, is not merely an artifact of perseveration, we conducted additional analyses as suggested.

      First, following the reviewerโ€™s suggestion we simulated our Bayesian models, including a perseveration term, for both our main and discovery studies. Crucially, none of these simulations predicted the specific pattern of inflated positivity bias for low-credibility feedback that we identified in participants.

      Additionally, taking a โ€œdevilโ€™s advocateโ€ approach, we tested whether our credibility-CA model (which includes perseveration but not a feedback valence bias) can predict our positivity bias findings. Thus, we simulated 100 datasets using our Credibility-CA model (based on empirical best-fitting parameters). We then fitted each of these simulated datasets using our CredibilityValence CA model. By examining the distribution of results across these synthetic datasets fits and comparing them to the actual results from participants, we found that while perseveration could indeed lead (as the reviewer suspected) to an artifactual positivity bias, it could not predict the magnitude of the observed inflation of positivity bias for low-credibility feedback (whether measured in absolute or relative terms).

      Based on these comprehensive analyses, we are confident that our main results concerning the modulation of a valence bias as a function of source-credibility cannot be accounted by simple choice-perseveration. We have briefly explained these analyses in the main results section:

      โ€œPrevious research has suggested that positivity bias may spuriously arise from pure choice-perseveration (i.e., a tendency to repeat previous choices regardless of outcome) (49,50). While our models included a perseveration-component, this control may not be preferent. Therefore, in additional control analyses, we generated synthetic datasets using models including choice-perseveration but devoid of feedback-valence bias, and fitted them with our credibility-valence model (see SI 3.6.1). These analyses confirmed that perseveration can masquerade as an apparent positivity bias. Critically, however, these analyses also confirmed that perseveration cannot account for our main finding of increased positivity bias, relative to the overall extent of CA, for low-credibility feedback.โ€

      Additionally, we have added a detailed description of these additional analyses and their findings to the Supplementary Information document:

      โ€œ3.6 Positivity bias results cannot be explained by a pure perseveration

      3.6.1 Main study

      Previous research has suggested it may be challenging to dissociate between a feedback-valence positivity bias and perseveration (i.e., a tendency to repeat previous choices regardless of outcome). While our Credit Assignment (CA) models already include a perseveration mechanism to account for this, this control may not be perfect. We thus conducted several tests to examine if our positivity-bias related results could be accounted for by perseveration.

      First we examined whether our Bayesian-models, augmented by a perseveration mechanism (as in our CA model) can generate predictions similar to our empirical results. We repeated our cross-fitting procedure to these extended Bayesian models. To briefly recap, this involved fitting participant behavior with them, generating synthetic datasets based on the resulting maximum likelihood (ML) parameters, and then fitting these simulated datasets with our Credibility-Valence CA model (which is designed to detect positivity bias). This test revealed that adding perseveration to our Bayesian models did not predict a positivity bias in learning. In absolute terms there was a small negativity bias (instructed-credibility Bayesian: b=โˆ’0.19, F(1,1218)=17.78, p<0.001, Fig. S23a-b; free-credibility Bayesian: b=โˆ’0.17, F(1,1218)=13.74, p<0.001, Fig. S23d-e). In relative terms we detected no valence related bias (instructed-credibility Bayesian: b=โˆ’0.034, F(1,609)=0.45, p=0.50, Fig. S22c; free-credibility Bayesian: b=โˆ’0.04, F(1,609)=0.51, p=0.47, Fig. S23f). More critically, these simulations also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (instructed-credibility Bayesian: F(2,1218)=0.024, p=0.98, Fig. S23b; free-credibility Bayesian: F(2,1218)=0.008, p=0.99, Fig. S23e), nor at a relative level (instructedcredibility Bayesian: F(2,609)=1.57, p=0.21, Fig. S23c; free-credibility Bayesian: F(2,609)=0.13, p=0.88, Fig. S23f). The upshot is that our positivity-bias findings cannot be accounted for by our Bayesian models even when these are augmented with perseveration.

      However, it is still possible that empirical CA parameters from our credibility-valence model (reported in main text Fig. 5) were distorted, absorbing variance from a perseveration. To address this, we took a โ€œdevil's advocateโ€ approach testing the assumption that CA parameters are not truly affected by feedback valance and that there is only perseveration in our data. Towards that goal, we simulated data using our CredibilityCA model (which includes perseveration but does not contain a valence bias in its learning mechanism) and then fitted these synthetic datasets using our Credibility-Valence CA model to see if the observed positivity bias could be explained by perseveration alone. Specifically, we generated 101 โ€œgroup-levelโ€ synthetic datasets (each including one simulation for each participant, based on their empirical ML parameters), and fitted each dataset with our Credibility-Valence CA model. We then analysed the resulting ML parameters in each dataset using the same mixed-effects models as described in the main text, examining the distribution of effects of interest across these simulated datasets. Comparing these simulation results to the data from participants revealed a nuanced picture. While the positivity bias observed in participants is within the range predicted by a pure perseveration account when measured in absolute terms (Fig. S24a), it is much higher than predicted by pure perseveration when measured relative to the overall level of learning (Fig. S24c). More importantly, the inflation in positivity bias for lower credibility feedback is substantially higher in participants than what would be predicted by a pure perseveration account, a finding that holds true for both absolute (Fig. S24b) and relative (Fig. S24d) measures.โ€

      โ€œ3.6.2 Discovery study

      We then replicated these analyses in our discovery study to confirm our findings. We again checked whether extended versions of the Bayesian models (including perseveration) predicted the positivity bias results observed. Our cross-fitting procedure showed that the instructed-credibility Bayesian model with perseveration did predict a positivity bias for all credibility levels in this discovery study, both when measured in absolute terms [50% credibility (b=1.74,t(824)=6.15), 70% credibility (b=2.00,F(1,824)=49.98), 85% credibility (b=1.81,F(1,824)=40.78), 100% credibility (b=2.42,F(1,824)=72.50), all p's<0.001], and in relative terms [50% credibility (b=0.25,t(412)=3.44), 70% credibility (b=0.31,F(1,412)=17.72), 85% credibility (b=0.34,F(1,412)=21.06), 100% credibility (b=0.42,F(1,412)=31.24), all p's<0.001]. However, importantly, these simulations did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,412)=1.43,p=0.24), nor at a relative level (F(3,412)=2.06,p=0.13) (Fig. S25a-c). In contrast, simulations of the free-credibility Bayesian model (with perseveration) predicted a slight negativity bias when measured in absolute terms (b=โˆ’0.35,F(1,824)=5.14,p=0.024), and no valence bias when measured relative to the overall degree of learning (b=0.05,F(1,412)=0.55,p=0.46). Crucially, this model also did not predict a change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,824)=0.27,p=0.77), nor at a relative level (F(3,412)=0.76,p=0.47) (Fig. S25d-f).

      As in our main study, we next assessed whether our Credibility-CA model (which includes perseveration but no valence bias) predicted the positivity bias results observed in participants in the discovery study. This analysis revealed that the average positivity bias in participants is higher than predicted by a pure perseveration account, both when measured in absolute terms (Fig. S26a) and in relative terms (Fig. S26c). Specifically, only the aVBI for the 70% credibility agent was above what a perseveration account would predict, while the rVBI for all agents except the completely credible one exceeded that threshold. Furthermore, the inflation in positivity bias for lower credibility feedback (compared to the 100% credibility agent) is significantly higher in participants than would be predicted by a pure perseveration account, in both absolute (Fig. S26b) and relative (Fig. S26d) terms.

      Together, these results show that the general positivity bias observed in participants could be predicted by an instructed-credibility Bayesian model with perseveration, or by a CA model with perseveration. Moreover, we find that these two models can predict a positivity bias for the 50% credibility agent, raising a concern that our positivity bias findings for this source may be an artefact of not-fully controlled for perseveration. However, the credibility modulation of this positivity bias, where the bias is amplified for lower credibility feedback, is consistently not predicted by perseveration alone, regardless of whether perseveration is incorporated into a Bayesian or a CA model. This finding suggests that participants are genuinely modulating their learning based on feedback credibility, and that this modulation is not merely an artifact of choice perseveration.โ€

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      Before addressing these excellent comments, we first note that we have now improved our "TruthCA" model. Previously, our Truth-CA model considered whether feedback on each trial was true or not based on realized latent true outcomes. However, it is possible that the very same feedback would have had an opposite truth-status if the latent true outcome was different (recall true outcomes are stochastic). This injects noise into the trial classification in our former model. To avoid this, in our new model feedback is modulated by the probability the reported feedback is true (marginalized over stochasticity of true outcome). Please note in our responses below that we conducted extensive analysis to confirm that positivity bias doesnโ€™t in fact predict the truthbias we detect using our truth biased model

      We have described this new model in the methods section:

      โ€œAdditionally, we formulated a โ€œTruth-CAโ€ model, which worked as our Credibility-CA model, but incorporated a free truth-bonus parameter (TB). This parameter modulates the extent of credit assignment for each agent based on the posterior probability of feedback being true (given the credibility of the feedback agent, and the true reward probability of the chosen bandit). The chosen bandit was updated as follows:

      ๐‘„ โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„ + [๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก) + ๐‘‡๐ต โˆ— (๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) โˆ’ 0.5)] โˆ— ๐น

      where P(truth) is the posterior probability of the feedback being true in the current trial (for exact calculation of P(truth) see โ€œMethods: Bayesian estimation of posterior belief that feedback is trueโ€).โ€

      All relevant results have been updated accordingly in the main text:

      To formally address whether feedback truthfulness modulates credit assignment, we fitted a new variant of the CA model (the โ€œTruth-CAโ€ model) to the data. This variant works as our Credibility-CA model, but incorporated a truth-bonus parameter (TB) which increases the degree of credit assignment for feedback as a function of the experimenter-determined likelihood the feedback is true (which is read from the curves in Fig 6a when x is taken to be the true probability the bandit is rewarding). Specifically, after receiving feedback, the Q-value of the chosen option is updated according to the following rule:

      ๐‘„ โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„ + [๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก) + ๐‘‡๐ต โˆ— (๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) โˆ’ 0.5)] โˆ— ๐น

      where ๐‘‡๐ต is the free parameter representing the truth bonus, and ๐‘ƒ(๐‘ก๐‘Ÿ๐‘ข๐‘กโ„Ž) is the probability the received feedback being true (from the experimenterโ€™s perspective). We acknowledge that this model falls short of providing a mechanistically plausible description of the credit assignment process, because participants have no access to the experimenterโ€™s truthfulness likelihoods (as the true bandit reward probabilities are unknown to them). Nonetheless, we use this โ€˜oracle modelโ€™ as a measurement tool to glean rough estimates for the extent to which credit assignment Is boosted as a function of its truthfulness likelihood.

      Fitting this Truth-CA model to participants' behaviour revealed a significant positive truth-bonus (mean=0.21, t(203)=3.12, p=0.002), suggesting that participants indeed assign greater weight to feedback that is likely to be true (Fig. 6c; see SI 3.3.1 for detailed ML parameter results). Notably, simulations using our other models (Methods) consistently predicted smaller truth biases (compared to the empirical bias) (Fig. 6d). Moreover, truth bias was still detected even in a more flexible model that allowed for both a positivity bias and truth-bias (see SI 3.7). The upshot is that participants are biased to assign higher credit based on feedback that is more likely to be true in a manner that is inconsistent with out Bayesian models and above and beyond the previously identified positivity biases.โ€

      Finally, the Supplementary Information for the discovery study has also been revised to feature this analysis:

      โ€œWe next assessed whether participants infer whether the feedback they received on each trial was true or false and adjust their credit assignment based on this inference. We again used the โ€œTruth-CAโ€ model to obtain estimates for the truth bonus (TB), the increase in credit assignment as a function of the posterior probability of feedback being true. As in our main study, the fitted truth bias parameter was significantly positive, indicating that participants assign greater weight to feedback they believe is likely to be true (Fig, S4a; see SI 3.3.1 for detailed ML parameter results). Strikingly, model-simulations (Methods) predicted a lower truth bonus than the one observed in participants (Fig. S4b).โ€

      Additionally, we thank the reviewer for pointing us to the relevant work by Pilgrim et al. (2024). We agree that the relationship between "true feedback" and "positivity bias" effects is nuanced, and their potential overlap warrants careful consideration. Note our analyses suggest that this is not solely the case. Firstly, simulations of our Credibility-Valence CA model predict only a small "truth bonus" effect, which is notably smaller than what we observed in participants. Secondly, we formulated an extension of our "Truth-CA" model that includes a valence bias in credit assignment. If our truth bonus results were merely an artifact of positivity bias, this extended model should absorb that variance, producing a null truth bonus parameter. However, fitting this model to participant data still revealed a significant positive truth bonus, which again exceeds the range predicted by simulations of our Credibility CA model:

      โ€œ3.7 Truth inference is still detected when controlling for valence bias

      Given that participants frequently select bandits that are, on average, mostly rewarding, it is reasonable to assume that positive feedback is more likely to be objectively true than negative feedback. This raises a question if the "truth inference" effect we observed in participants might simply be an alternative description of a positivity bias in learning. To directly test this idea, we extended our Truth-CA model to explicitly account for a valence bias in credit assignment. This extended model features separate CA parameters for positive and negative feedback for each agent. When we fitted this new model to participant behavior, it still revealed a significant truth bonus in both the main study (Wilkoxonโ€™s signrank test: median = 0.09, z(202)=2.12, p=0.034; Fig. S27a) and the discovery study (median = 3.52, z(102)=7.86, p<0.001; Fig. S27c). Moreover, in the main study, this truth bonus remained significantly higher than what was predicted by all the alternative models, with the exception of the instructed-credibility bayesian model (Fig. S27b). In the discovery study, the truth bonus was significantly higher than what was predicted by all the alternative models (Fig. S27d).โ€

      Together, these findings suggest that our truth inference results are not simply a re-description of a positivity bias.

      Conversely, we acknowledge the reviewer's point that our positivity bias results could potentially stem from a more general truth inference mechanism. We believe that this possibility should be addressed in a future study where participants rate their belief that received feedback is true (rather than a lie).We have extended our discussion to clarify this possibility and to include the suggested citation:

      โ€œOur findings show that individuals increase their credit assignment for feedback in proportion to the perceived probability that the feedback is true, even after controlling for source credibility and feedback valence. Strikingly, this learning bias was not predicted by any of our Bayesian or credit-assignment (CA) models. Notably, our evidence for this bias is based on a โ€œoracle modelโ€ that incorporates the probability of feedback truthfulness from the experimenter's perspective, rather than the participantโ€™s. This raises an important open question: how do individuals form beliefs about feedback truthfulness, and how do these beliefs influence credit assignment? Future research should address this by eliciting trial-by-trial beliefs about feedback truthfulness. Doing so would also allow for testing the intriguing possibility that an exaggerated positivity bias for non-credible sources reflects, to some extent, a truth-based discounting of negative feedbackโ€”i.e., participants may judge such feedback as less likely to be true. However, it is important to note that the positivity bias observed for fully credible sources (here and in other literature) cannot be attributed to a truth biasโ€”unless participants were, against instructions, distrustful of that source.โ€

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      โ€œAlternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.โ€

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      We apologize for the lack of clarity in our previous explanation. We removed the sentence you cited (it was intended to make a different point which we now consider non-essential). Our current narration is consistent with the point you are making.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to disinformation, rather than simply less information. I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      We thank the reviewer for highlighting the parallel (and difference) between feedback reliability and reward stochasticity. However, we have not found any comparable results in the literature. We also note that our discussion includes a paragraph addressing the locus of our effects making the point that more studies are necessary to determine whether our findings are due to disinformation per se or sources being less informative. While this paragraph was included in the previous version it led us to infer our Discussion was too long and we therefore shortened it considerably:

      โ€œAn important question arises as to the psychological locus of the biases we uncovered. Because we were interested in how individuals process disinformationโ€”deliberately false or misleading information intended to deceive or manipulateโ€”we framed the feedback agents in our study as deceptive, who would occasionally โ€œlieโ€ about the true choice outcome. However, statistically (though not necessarily psychologically), these agents are equivalent to agents who mix truth-telling with random โ€œguessingโ€ or โ€œnoiseโ€ where inaccuracies may arise from factors such as occasionally lacking access to true outcomes, simple laziness, or mistakes, rather than an intent to deceive. This raises the question of whether the biases we observed are driven by the perception of potential disinformation as deceitful per se or simply as deviating from the truth. Future studies could address this question by directly comparing learning from statistically equivalent sources framed as either lying or noisy. Unlike previous studies wherein participants had to infer source credibility from experience (30,37,72), we took an explicit-instruction approach, allowing us to precisely assess source-credibility impact on learning, without confounding it with errors in learning about the sources themselves. More broadly, our work connects with prior research on observational learning, which examined how individuals learn from the actions or advice of social partners (72โ€“75). This body of work has demonstrated that individuals integrate learning from their private experiences with learning based on othersโ€™ actions or adviceโ€”whether by inferring the value others attribute to different options or by mimicking their behavior (57,76). However, our task differs significantly from traditional observational learning. Firstly, our feedback agents interpret outcomes rather than demonstrating or recommending actions (30,37,72). Secondly, participants in our study lack private experiences unmediated by feedback sources. Finally, unlike most observational learning paradigms, we systematically address scenarios with deliberately misleading social partners. Future studies could bridge this by incorporating deceptive social partners into observational learning, offering a chance to develop unified models of how individuals integrate social information when credibility is paramount for decision-making.โ€

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      We thank the reviewer for this important suggestion which we address together with the following point.

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore not Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

      We thank the reviewer for their insightful and illuminating comments, particularly concerning the interpretation of our model parameters and the nature of our Credit assignment model. We believe your interpretation of the model is accurate and we now narrate it to readers in the hope that our modelling will become clearer and more intuitively. We also present to readers how these recasts our โ€œcross-fittingโ€ approach in the way you suggested (we return to this point below).

      Broadly, while we agree that modelling results depend on underlying assumptions, we believe that โ€œmodel-agnosticโ€ approaches also have important limitationsโ€”especially in reinforcement learning (RL), where choices are shaped by histories of past events, which such approaches often fail to fully account for. As students of RL, we are frequently struck by how careful modelling demonstrates that seemingly meaningful โ€œmodel-agnosticโ€ patterns can emerge as artefacts of unaccounted-for variables. We also note that the term โ€œmodel-agnosticโ€ is difficult to defineโ€”after all, even regression models rely on assumptions, and some computational models make richer or more transparent assumptions than others. Ideally, we aim to support our findings using converging methods wherever possible.

      We want to clarify that many of our reported findings indeed stem from straightforward behavioral analyses (e.g., simple regressions of choice-repetition), which do not rely on complex modeling assumptions. The two key results that primarily depend on the analysis of model parameters are our findings related to positivity bias and truth inference.

      Regarding the positivity bias, identifying truly model-agnostic behavioral signatures, distinct from effects like choice-perseveration, has historically been a significant challenge in the literature. Classical research on this bias rests on the interpretation of model parameters (Lefebvre et al., 2017; Palminteri et al., 2017), or at least on the use of models to assess what an โ€œunbiased learnerโ€ baseline should look like (Palminteri & Lebreton, 2022). Some researchers have suggested possible regressions incorporating history effects to detect positivity bias from choicerepetition behavior, but these regressions (as our model) rely on subtle assumptions about forgetting and history effects (Toyama et al., 2019). Specifically, in our case, this issue is also demonstrated by analysis we conducted related to the previous point the reviewer made (about perseveration masquerading as positivity bias). We believe that dissociating clearly positivity bias from perseveration is an important challenge for the field going forward.

      For our truth inference results, obtaining purely behavioral signatures is similarly challenging due to the intricate interdependencies (the reviewer has identified in previous points) between agent credibility, feedback valence, feedback truthfulness, and choice accuracy within our task design.

      Finally, we agree with the reviewer that regression coefficients are often interpreted as a โ€œmodelagnosticโ€ pattern. From this perspective even our findings regarding positivity and truth bias are not a case of over-reliance on complex model assumptions but are rather a way to expose deviations between empirical โ€œsophisticatedโ€ regression coefficients and coefficients predicted from Bayesian models.

      We have now described the main learning rule of our model in the main text to ensure that the meaning of the CA parameters is clearer for readers:

      โ€œNext, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, โ€œQ valueโ€œ, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free โ€œCredit-Assignment (CA)โ€ model parameters (47):

      ๐‘„(๐‘โ„Ž๐‘œ๐‘ ๐‘’๐‘›) โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„(๐‘โ„Ž๐‘œ๐‘ ๐‘’๐‘›) + ๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก, ๐‘ฃ๐‘Ž๐‘™๐‘’๐‘›๐‘๐‘’) โˆ— ๐น

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (โˆˆ[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods). The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a โ€œsophisticatedโ€ logistic regression, where the CA parameters take the role of โ€œregression coefficientsโ€ corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback; see โ€œMethods: RL modelsโ€) . The forgetting rate captures the extent to which the effect of each trial on future choices diminishes with time. The Q-values are thus exponentially decaying sums of logistic choice propensities based on the types of feedback a bandit received.โ€

      We also explain the implications of this perspective for our cross-fitting procedure:

      โ€œTo further characterise deviations between behaviour and our Bayesian learning models, we used a โ€œcrossfittingโ€ method. Treating CA parameters as data-features of interest (i.e., feedback dependent changes in choice propensity), our goal was to examine if and how empirical features differ from features extracted from simulations of our Bayesian learning models. Towards that goal, we simulated synthetic data based on Bayesian agents (using participantsโ€™ best fitting parameters), but fitted these data using the CA-models, obtaining what we term โ€œBayesian-CA parametersโ€ (Fig. 2d; Methods). A comparison of these BayesianCA parameters, with empirical-CA parameters obtained by fitting CA models to empirical data, allowed us to uncover patterns consistent with, or deviating from, ideal-Bayesian value-based inference. Under the sophisticated logistic-regression interpretation of the CA-model family the cross-fitting method comprises a comparison between empirical regression coefficients (i.e., empirical CA parameters) and regression coefficient based on simulations of Bayesian models (Bayesian CA parameters). Using this approach, we found that both the instructed-credibility and free-credibility Bayesian models predicted increased BayesianCA parameters as a function of agent credibility (Fig. 3c; see SI 3.1.1.2 Tables S8 and S9). However, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.โ€

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Keep terms consistent, e.g., follow-up vs. main; hallmark vs. traditional.

      We have now changed the text to keep terms consistent.

      (2) CA model is like a learning rate; but it's based on the raw reward, not the TD error - this seems strange.

      We thank the reviewer for this comment. We understand that the use of a CA model instead of a TD error model may seem unusual at first glance. However, the CA model offers an important advantage: it more easily accommodates what we term "negative learning rates". This means that some participants may treat certain agents (especially the random one) as consistently deceitful, leading them to effectively increase/reduce choice tendencies following negative/positive feedback. A CA model handles this naturally by allowing negative CA parameters as a simple extension of positive ones. In contrast, adapting a TD error model to account for this is more complex. For instance, attempting to introduce a "negative learning rate" makes the RW model behave in a non-stable manner (e.g., Q values become <0 or >1). At the initial stages of our project, we explored different approaches to dealing with this issue and we found the CA model provides the best approach. For these reasons, we decided to proceed with our CA model.

      Additionally, we used the CA model in previous studies (e.g., Moran, Dayan & Dolan (2021)) where we included (in SI) a detailed discussion of the similarities and difference between creditassignment and Rescorla-Wagner models

      (3) Why was the follow-up study not pre-registered?

      We appreciate the reviewer's comment regarding preregistration, which we should have done. Unfortunately, this is now โ€œwater under the bridgeโ€ but going forward we hope to pre-register increasing parts of our work.

      (4) Other work looking at reward stochasticity?

      As noted in point 4 of the main weaknesses, previous work on reward stochasticity primarily focused on explaining the increase/decrease in learning and its mechanistic bases under varying stochasticity levels. In our study, we uniquely characterize several specific learning biases that are modulated by source credibility, a topic not extensively explored within the existing reward stochasticity framework, as far as we know.

      (5) Equation 1 is different from the one in the figure?

      The reviewer is completely correct. The figure provides a simplified visual representation, primarily focusing on the feedback-based update of the Q-value, and for simplicity, it omits the forgetting term present in the full Equation 1. To ensure complete clarity and prevent any misunderstanding, we have now incorporated a more detailed explanation of the model, including the complete Equation 1 and its components, directly within the main text. This comprehensive description will ensure that readers are fully aware of how the model operates.

      โ€œNext, we formulated a family of non-Bayesian computational RL models. Importantly, these models can flexibly express non-Bayesian learning patterns and, as we show in following sections, can serve to identify learning biases deviating from an idealized Bayesian strategy. Here, an assumption is that during feedback, the choice propensity for the chosen bandit (which here is represented by a point estimate, โ€œQ valueโ€œ, rather than a distribution) either increases or decreases (for positive or negative feedback, respectively) according to a magnitude quantified by the free โ€œCredit-Assignment (CA)โ€ model parameters (47):

      ๐‘„(๐‘โ„Ž๐‘œ๐‘ ๐‘’๐‘›) โ† (1 โ€“ ๐‘“<sub>Q</sub>) โˆ— ๐‘„(๐‘โ„Ž๐‘œ๐‘ ๐‘’๐‘›) + ๐ถ๐ด(๐‘Ž๐‘”๐‘’๐‘›๐‘ก, ๐‘ฃ๐‘Ž๐‘™๐‘’๐‘›๐‘๐‘’) โˆ— ๐น

      where F is the feedback received from the agents (coded as 1 for reward feedback and -1 for non-reward feedback), while fQ (โˆˆ[0,1]) is the free parameter representing the forgetting rate of the Q-value (Fig. 2a, bottom panel; Fig. S5b; Methods).โ€

      (6) Please describe/plot the distribution of all fitted parameters in the supplement. I would include the mean and SD in the main text (methods) as well.

      Following the reviewerโ€™s suggestions, we have included in the Supplementary Document tables displaying the mean and SD of fitted parameters from participants for our main models of interest. We have also plotted the distributions of such parameters. Both for the main study:

      (7) "A novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework".

      The idea of applying RL to disinformation is not new. Please tone down novelty claims. It would be nice to cite/discuss some of this work as well.

      https://arxiv.org/abs/2106.05402?utm_source=chatgpt.com https://www.scirp.org/pdf/jbbs_2022110415273931.pdf https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4173312

      We thank the reviewer for pointing us towards relevant literature. We have now toned down the sentence in the introduction and cited the references provided:

      โ€œTo address these questions, we adopt a novel approach within the disinformation literature by exploiting a Reinforcement Learning (RL) experimental framework (36). While RL has guided disinformation research in recent years (37โ€“40), our approach is novel in using one of its most popular tasks: the โ€œbandit taskโ€.โ€

      (8) Figure 3a - The figures should be in the order that they're referenced (3 is referenced before 2).

      We generally try to stick to this important rule but, in this case, we believe that our ordering serves better the narrative and hope the reviewer will excuse this small violation.

      (9) "Additionally, we found a positive feedback-effect for the 3-star agent"

      What is the analysis here? To avoid confusion with the "positive feedback" effect, consider using "positive effect of feedback". The dash wasn't sufficient to avoid confusion in my case.

      We have now updated the terms in the text to avoid confusion.

      (10) The discovery study revealed even stronger results supporting a conclusion that the credibility-CA model was superior to both Bayesian models for most subjects

      This is very subjective, but I'll just mention that my "cherry-picking" flag was raised by this sentence. Are you only mentioning cases where the discovery study was consistent with the main study? Upon a closer read, I think the answer is most likely "no", but you might consider adopting a more systematic (perhaps even explicit) policy on when and how you reference the discovery study to avoid creating this impression in a more casual reader.

      We thank the reviewer for this valuable suggestion. To prevent any impression of "cherry-picking", we have removed specific references to the discovery study from the main body of the text. Instead, all discussions regarding the convergence and divergence of results between the two studies are now in the dedicated section focusing on the discovery study:

      โ€œThe discovery study (n=104) used a disinformation task structurally similar to that used in our main study, but with three notable differences: 1) it included 4 feedback agents, with credibilities of 50%, 70%, 85% and 100%, represented by 1, 2, 3, and 4 stars, respectively; 2) each experimental block consisted of a single bandit pair, presented over 16 trials (with 4 trials for each feedback agent); and 3) in certain blocks, unbeknownst to participants, the two bandits within a pair were equally rewarding (see SI section 1.1). Overall, this study's results supported similar conclusions as our main study (see SI section 1.2) with a few differences. We found convergent support for increased learning from more credible sources (SI 1.2.1), superior fit for the CA model over Bayesian models (SI 1.2.2) and increased learning from feedback inferred to be true (SI 1.2.6). Additionally, we found an inflation of positivity bias for low-credibility both when measured relative to the overall level of credit assignment (as in our main study), or in absolute terms (unlike in our main study) (Fig. S3; SI 1.2.5). Moreover, choice-perseveration could not predict an amplification of positivity bias for low-credibility sources (see SI 3.6.2). However, we found no evidence for learning based on 50%-credibility feedback when examining either the feedback effect on choice repetition or CA in the credibility-CA model (SI 1.2.3).โ€

      (11) An in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from normative Bayesian learning.

      Consider saying where this in-depth comparison can be found (based on my reading, I think you're referring to the next section?

      We have now modified the sentence for better clarity:

      โ€œHowever, an in-depth comparison between Bayesian and empirical CA parameters revealed discrepancies from ideal Bayesian learning, which we describe in the following sections.โ€

      (12) "which essentially provides feedback" Perhaps you meant "random feedback"?

      We have modified the text as suggested by the reviewer.

      <(13) Essentially random

      Why "essentially"? Isn't it just literally random?

      We have modified the text as suggested by the reviewer.

      (14) Both Bayesian models predicted an attenuated credit-assignment for the 3-star agent

      Attenuated relative to what? I wouldn't use this word if you mean weaker than what we see in the human data. Instead, I would say people show an exaggerated credit-assignment, since Bayes is the normative baseline.

      We changed the text according to the reviewerโ€™s suggestion:

      โ€œA comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models.โ€

      (15) "there was no difference between 2-star and 3-star agent contexts (b=0.051, F(1,2419)=0.39, p=0.53)"

      You cannot confirm the null hypothesis! Instead, you can write "The difference between 2-star and 3-star agent contexts was not significant". Although even with this language, you should be careful that your conclusions don't rest on the lack of a difference (the next sentence is somewhat ambiguous on this point).

      Additionally, the reported b coefs do not match the figure, which if anything, suggests a larger drop from 0.75 (2-star) to 1 (3-star). Is this a mixed vs fixed effects thing? It would be helpful to provide an explanation here.

      We thank the reviewer for this question. When we previously submitted our manuscript, we thought that finding enhanced credit-assignment for fully credible feedback following potential disinformation from a DIFFERENT context would constitute a striking demonstration of our โ€œcontrast effectโ€. However, upon reexamining this finding we found out we had a coding error (affecting how trials were filtered). We have now rerun and corrected this analysis. We have assessed the contrast effect for both "same-context" trials (where the contextual trial featured the same bandit pair as the learning trial) and "different-context" trials (where the contextual trial featured a different bandit pair). Our re-analysis reveals a selective significant contrast effect in the same-context condition, but no significant effect in the different-context condition. We have updated the main text to reflect these corrected findings and provide a clearer explanation of the analysis:

      โ€œA comparison of empirical and Bayesian credit-assignment parameters revealed a further deviation from ideal Bayesian learning: participants showed an exaggerated credit-assignment for the 3-star agent compared with Bayesian models [Wilcoxon signed-rank test, instructed-credibility Bayesian model (median difference=0.74, z=11.14); free-credibility Bayesian model (median difference=0.62, z=10.71), all pโ€™s<0.001] (Fig. 3a). One explanation for enhanced learning for the 3-star agents is a contrast effect, whereby credible information looms larger against a backdrop of non-credible information. To test this hypothesis, we examined whether the impact of feedback from the 3-star agent is modulated by the credibility of the agent in the trial immediately preceding it. More specifically, we reasoned that the impact of a 3-star agent would be amplified by a โ€œlow credibility contextโ€ (i.e., when it is preceded by a low credibility trial). In a binomial mixed effects model, we regressed choice-repetition on feedback valence from the last trial featuring the same bandit pair (i.e., the learning trial) and the feedback agent on the trial immediately preceding that last trial (i.e., the contextual credibility; see Methods for model-specification). This analysis included only learning trials featuring the 3-star agent, and context trials featuring the same bandit pair as the learning trial (Fig. 4a). We found that feedback valence interacted with contextual credibility (F(2,2086)=11.47, p<0.001) such that the feedback-effect (from the 3-star agent) decreased as a function of the preceding context-credibility (3-star context vs. 2-star context: b= -0.29, F(1,2086)=4.06, p=0.044; 2star context vs. 1-star context: b=-0.41, t(2086)=-2.94, p=0.003; and 3-star context vs. 1-star context: b=0.69, t(2086)=-4.74, p<0.001) (Fig. 4b). This contrast effect was not predicted by simulations of our main models of interest (Fig. 4c). No effect was found when focussing on contextual trials featuring a bandit pair different than the one in the learning trial (see SI 3.5). Thus, these results support an interpretation that credible feedback exerts a greater impact on participantsโ€™ learning when it follows non-credible feedback, in the same learning context.โ€

      We have modified the discussion accordingly as well:

      โ€œA striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by our Bayesian models). Furthermore, the effect of fully credible feedback on choice was further boosted when it was preceded by a low-credibility context related to current learning. We interpret this in terms of a โ€œcontrast effectโ€, whereby veridical information looms larger against a backdrop of disinformation (21). One upshot is that exaggerated learning might entail a risk of jumping to premature conclusions based on limited credible evidence (e.g., a strong conclusion that a vaccine produces significant side-effect risks based on weak credible information, following non-credible information about the same vaccine). An intriguing possibility, that could be tested in future studies, is that participants strategically amplify the extent of learning from credible feedback to dilute the impact of learning from noncredible feedback. For example, a person scrolling through a social media feed, encountering copious amounts of disinformation, might amplify the weight they assign to credible feedback in order to dilute effects of โ€˜fake newsโ€™. Ironically, these results also suggest that public campaigns might be more effective when embedding their messages in low-credibility contexts, which may boost their impact.โ€

      And we have included some additional analyses in the SI document:

      โ€œ3.5 Contrast effects for contexts featuring a different bandit Given that we observed a contrast effect when both the learning and the immediately preceding "context trialโ€ involved the same pair of bandits, we next investigated whether this effect persisted when the context trial featured a different bandit pair โ€“ a situation where the context would be irrelevant to the current learning. Again, we used in a binomial mixed effects model, regressing choice-repetition on feedback valence in the learning trial and the feedback agent in the context trial. This analysis included only learning trials featuring the 3-star agent, and context trials featuring a different bandit pair than the learning trial (Fig. S22a). We found no significant evidence of an interaction between feedback valence and contextual credibility (F(2,2364)=0.21, p=0.81) (Fig. S22b). This null result was consistent with the range of outcomes predicted by our main computational models (Fig. S22c).โ€

      We aimed to formally compare the influence of two types of contextual trials: those featuring the same bandit pair as the learning trial versus those featuring a different pair. To achieve this, we extended our mixedeffects model by incorporating a new predictor variable, "CONTEXT_TYPE" which coded whether the contextual trial involved the same bandit pair (coded as -0.5) or a different bandit pair (+0.5) compared to the learning trial. The Wilkinson notation for this expanded mixed-effects model is:

      ๐‘…๐ธ๐‘ƒ๐ธ๐ด๐‘‡ ~ ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡_๐‘‡๐‘Œ๐‘ƒ๐ธ โˆ— ๐น๐ธ๐ธ๐ท๐ต๐ด๐ถ๐พ โˆ— (๐ถ ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>2-star</sub> + ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>3-star</sub>) + ๐ต๐ธ๐‘‡๐‘‡๐ธ๐‘… + (1|๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘–๐‘๐‘–๐‘๐‘Ž๐‘›๐‘ก)

      This expanded model revealed a significant three-way interaction between feedback valence, contextual credibility, and context type (F(2,4451) = 7.71, p<0.001). Interpreting this interaction, we found a 2-way interaction between context-source and feedback valence when the context was the same (F(2,4451) = 12.03, p<0.001), but not when context was different (F(2,4451) = 0.23, p = 0.79). Further interpreting the double feedback-valence * context-source interaction (for the same context) we obtained the same conclusions as reported in the main text.โ€

      (16) "Strikingly, model-simulations (Methods) showed this pattern is not predicted by any of our other models"

      Why doesn't the Bayesian model predict this?

      Thanks for the comment. Overall, Bayesian models do predict a slight truth inference effect (see Figure 6d). However, these effects are not as strong as the ones observed in participants, suggesting that our results go beyond what would be predicted by a Bayesian model.

      Conceptually, it's important to note that the Bayesian model can infer (after controlling for source credibility and feedback valence) whether feedback is truthful based solely on prior beliefs about the chosen bandit. Using this inferred truth to amplify the weight of truthful feedback would effectively amount to โ€œbootstrapping on oneโ€™s own beliefs.โ€ This is most clearly illustrated with the 50% agent: if one believes that a chosen bandit yields rewards 70% of the time, then positive feedback is more likely to be truthful than negative feedback. However, a Bayesian observer would also recognize that, given the agentโ€™s overall unreliability, such feedback should be ignored regardless.

      (17) "A striking finding in our study was that for a fully credible feedback agent, credit assignment was exaggerated (i.e., higher than predicted by a Bayesian strategy)".

      "Since we did not find any significant interactions between BETTER and the other regressors, we decided to omit it from the model formulation".

      Was this decision made after seeing the data? If so, please report the original analysis as well.

      We have included the BETTER regressor again, and we have re-run the analyses. We now report the results of such regression. We have also changed the methods section accordingly:

      โ€œWe used a different mixed-effects binomial regression model to test whether value learning from the 3-star agent was modulated by contextual credibility. We focused this analysis on instances where the previous trial with the same bandit pair featured the 3-star agent. We regressed the variable REPEAT, which indicated whether the current trial repeated the choice from the previous trial featuring the same bandit-pair (repeated choice=1, non-repeated choice=0). We included the following regressors: FEEDBACK coding the valence of feedback in the previous trial with the same bandit pair (positive=0.5, negative=-0.5), CONTEXT2-star indicating whether the trial immediately preceding the previous trial with the same bandit pair (context trial) featured the 2-star agent (feedback from 2-star agent=1, otherwise=0), and CONTEXT3star indicating whether the trial immediately preceding the previous trial with the same bandit pair featured the 3-star agent. We also included a regressor (BETTER) coding whether the bandit chosen in the learning trial was the better -mostly rewarding- or the worse -mostly unrewarding- bandit within the pair. We included in this analysis only current trials where the context trial featured a different bandit pair. The model in Wilkinsonโ€™s notation was:

      ๐‘…๐ธ๐‘ƒ๐ธ๐ด๐‘‡~ ๐น๐ธ๐ธ๐ท๐ต๐ด๐ถ๐พ โˆ— (๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>2-star</sub> + ๐ถ๐‘‚๐‘๐‘‡๐ธ๐‘‹๐‘‡<sub>3-star</sub>) + ๐ต๐ธ๐‘‡๐‘‡๐ธ๐‘… + (1|๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘–๐‘๐‘–๐‘๐‘Ž๐‘›๐‘ก) ( 13 )

      In figure 4c, we independently calculate the repeat probability difference for the better (mostly rewarding) and worse (mostly non-rewarding) bandits and averaged across them. This calculation was done at the participants level, and finally averaged across participants.โ€

  8. physerver.hamilton.edu physerver.hamilton.edu
    1. kinetic energy of agitation

      what is the "kinetic energy of agitation?" I've never heard of that before - I assume it has something to do with mixing or moving particles (particularly in a gas) around, but I'm just curious what it actually is and why it's important

    1. hypertext

      So, already to minor things: - be aware that your front page needs to contain mandatory elements such as "Proefschrift ingediend etc.," the name of the supervisor, faculty, logos and so forth - while online format is allowed, be aware that (at least in the past) a jury member was entitled to ask for a printed version. I'd have to check if that still applies and we can always hope that that's not the case but it's just a heads up...

    1. Author response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations. Importantly, the rescue experiments also demonstrated that sulfation enzymatic activity is important.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing.

      Significance:

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore, it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      Comments on revised version:

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers

      Reviewer #2 (Public review):

      Summary

      This study provides new insights into organ morphogenesis using the Drosophila salivary gland (SG) as a model. The authors identify a requirement for sulfation in regulating lumen expansion, which correlates with several effects at the cellular level, including regulation of intracellular trafficking and the organization of Golgi, the aECM and the apical membrane. In addition, the authors show that the ZP proteins Dumpy (Dpy) and Pio form an aECM regulating lumen expansion. Previous reports already pointed to a role for Papss in sulfation in SG and the presence of Dpy and Pio in the SG. Now this work extends these previous analyses and provides more detailed descriptions that may be relevant to the fields of morphogenesis and cell biology (with particular focus on ECM research and tubulogenesis). This study nicely presents valuable information regarding the requirements of sulfation and the aECM in SG development.

      Strengths

      -The results supporting a role for sulfation in SG development are strong. In addition, the results supporting the involvement of Dpy and Pio in the aECM of the SG, their role in lumen expansion, and their interactions, are also strong.

      -The authors have made an excellent job in revising and clarifying the many different issues raised by the reviewers, particularly with the addition of new experiments and quantifications. I consider that the manuscript has improved considerably.

      -The authors generated a catalytically inactive Papss enzyme, which is not able to rescue the defects in Papss mutants, in contrast to wild type Papss. This result clearly indicates that the sulfation activity of Papss is required for SG development.

      Weaknesses

      -The main concern is the lack of clear connection between sulfation and the phenotypes observed at the cellular level, and, importantly, the lack of connection between sulfation and the Pio-Dpy matrix. Indeed, the mechanism/s by which sulfation affects lumen expansion are not elucidated and no targets of this modification are identified or investigated. A direct (or instructive) role for sulfation in aECM organization is not clearly supported by the results, and the connection between sulfation and Pio/Dpy roles seems correlative rather than causative. As it is presented, the mechanisms by which sulfation regulates SG lumen expansion remains elusive in this study.

      -In my opinion the authors overestimate their findings with several conclusions, as exemplified in the abstract:

      "In the absence of Papss, Pio is gradually lost in the aECM, while the Dpy-positive aECM structure is condensed and dissociates from the apical membrane, leading to a thin lumen. Mutations in dpy or pio, or in Notopleural, which encodes a matriptase that cleaves Pio to form the luminal Pio pool, result in a SG lumen with alternating bulges and constrictions, with the loss of pio leading to the loss of Dpy in the lumen. Our findings underscore the essential role of sulfation in organizing the aECM during tubular organ formation and highlight the mechanical support provided by ZP domain proteins in maintaining luminal diameter."

      The findings leading to conclude that sulfation organizes the aECM and that the absence of Papss leads to a thin lumen due to defects in Dpy/Pio are not strong. The authors certainly show that Papss is required for proper Pio and Dpy accumulation. They also show that Pio is required for Dpy accumulation, and that Pio and Dpy form an aECM required for lumen expansion. However, the absence of Pio and Dpy do not fully recapitulate Papss mutant defects (thin lumen). I wonder whether other hypothesis and models could account for the observed results. For instance, a role for Papss affecting secretion, in which case sulfation would have an indirect role in aECM organization. This study does not address the mechanical properties of Dpy in normal and mutant salivary glands.

      -Minor issues relate to the genotype/phenotype analysis. It is surprising that the authors detect only mild effects on sulfation in Papss mutants using an anti-sulfoTyr antibody, as Papss is the only Papss synthathase. Generating germ line clones (which is a feasible experiment) would have helped to prove that this minor effect is due to the contribution of maternal product. The loss of function allele used in this study seems problematic, as it produces effects in heterozygous conditions difficult to interpret. Cleaning the chromosome or using an alternative loss of function condition (another allele, RNAi, etc...) would have helped to present a more reliable explanation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I am pleased with the authors' revisions in response to my original comments and those of the other reviewers. The addition of the sulfation(-) mutant to Fig. 1 is particularly nice. I have just a few additional suggestions for text changes to improve clarity/precision.

      (1) The current title of this manuscript is quite broad, making it sound like a review article. I recommend adding sulfation and salivary gland to the title to convey the main points more clearly. e.g. Sulfation affects apical extracellular matrix organization during development of the Drosophila salivary gland tube.

      Thank you for the suggestion. We agree and have changed the title of the paper as suggested.

      (2) Figure 1B shows very striking enrichment of papss expression in the salivary gland compared to other tubes like the trachea that also contain Pio and Dpy. To me, this implies that the key substrate(s) of Papss are likely to be unique, or at least more highly enriched, in the salivary gland aECM compared to the tracheal aECM (e.g. probably not Pio or Dpy themselves). I suggest that the authors address the implications of this apparent SG specificity in the discussion (paragraph beginning on p. 21, line 559).

      Yes, we agree that there may be other key substrates of Papss in the SG, such as mucins, which play an important role in organizing the aECM and expanding the lumen. We have included a discussion.

      (3) p. 15, lines 374-376 "The Pio protein is known to be cleaved, at one cleavage site after the ZP domain by the furin protease and at another cleavage site within the ZP domain by the matriptase Notopleural (Np) (Drees et al., 2019; Drees et al., 2023; Figure 5B)." As far as I can see, the Drees papers show that Pio is cleaved somewhere in the vicinity of a consensus furin cleavage site, but do not actually establish that the cleavage happens at this exact site or is done by a furin protease (this is just an assumption). Please word more carefully, e.g. "at one cleavage site after the ZP domain, possibly by a furin protease".

      Thank you for pointing this out. We have edited the text.

      Reviewer #2 (Recommendations for the authors):

      Throughout the paper, I find a bit confusing the description of the lumen phenotype and their interpretations.

      Papss mutants produce SG that are either "thin" or show "irregular lumen with bulges". Do the authors think that these are two different manifestations of the same effect? or do they think that there are different causes behind?

      The thin lumen phenotype appears to occur when the Pio-Dpy matrix is significantly condensed. When this matrix is less condensed in one region of the lumen than in other regions, the lumen appears irregular with bulges.

      Are the defects in Grasp65 mutants categorized as "irregular lumen with bulges" similar to those in Papss mutants? Why do these mutants don't show a "thin lumen" defect?

      Grasp65 mutant phenotypes are milder than those of Papss mutants. Multiple mutations in several Golgi components that more significantly disrupt Golgi structures and function may cause more severe defects in lumen expansion and shape.

      How the defects described for Pio ("multiple constrictions with a slight expansion between constrictions") and Dpy mutants ("lumen with multiple bulges and constrictions") relate to the "irregular lumen with bulges" in Papss mutants?

      pio and dpy mutants show more stereotypical phenotypes, while Papss mutants exhibit more irregular and random phenotypes. The irregular lumen phenotypes in Papss mutants are associated with a condensed Pio-Dpy matrix.